Estimating Fat Components of Potato Chips Using Visible and Near-Infrared Spectroscopy and a Compositional Calibration Model

When aiming to assess the fat composition of commercial potato chip products, their diversity and the difficulties to verify the nutritional label of batches of chips by official methods are main challenges. Thus, the possibility of using alternative technologies is of great interest for both the industry and the public administration. Near-infrared spectroscopy (NIRS) is a rapid and non-destructive technique that has been proven useful in different applications in the food industry. However, suitable specific treatments of compositional references with NIRS methods have been until now very scarce in the literature. The nutritional label information is commonly given as percentage content values across several nutritional categories. This formally corresponds with the class of so-called compositional data, for which there are specific statistical methods. This study contributes to ongoing research on the feasibility of Vis/NIR spectroscopy for food nutritional labelling. In particular, a calibration model is formulated to estimate the relative content of fat in potato chips products based on NIR spectral signal that integrates a consistent statistical treatment of the nutritional reference data. The method provides accurate estimates of the fat composition, with this including saturated, monounsaturated, and polyunsaturated types of fat, as well as their total fat percentage (cross-validated overall R2 = 0.88 and R2 = 0.82 from ground and fragmented samples respectively) and shows its potential for both nutritional labelling and verification in a rapid and inexpensive manner.


Introduction
Nutritional label information typically comprises numerical values referring to the contents across a number of classes of nutritional components. In the European Union, it is compulsory since December 2016 and the published standard (CE 2011) settles the duty for the producers to provide nutritional information as part of the food information to the consumer. Likewise, it is important to verify the compliance of imported products with the established nutritional labelling rules. The nutritional information includes total fat contents, saturated fatty acids (SFA), carbohydrates, sugars, proteins and salt; as well as energy value. The information on other nutrients, such as monounsaturated fatty acids (MUFA), polyunsaturated fatty acids (PUFA), trans-fatty acids, vitamins, and other healthy compounds, is voluntary according to the standard. Other possible relevant compounds, like cholesterol, can be also voluntarily provided.
Total fat content and the distinction between SFA, PUFA and MUFA types of fat are relevant in potato chips products. In relation to this, the most common information included up to date by the producers is total fat and saturated fat, although some products also provide information about MUFA and PUFA. SFA are those without any unsaturation within their chain, whereas MUFA are those fatty acids (FA) with a single unsaturation in their carbon chain. Oleic acid (C18:1) and palmitoleic acid (C16:1) are the major MUFA in chips. Oleic acid unsaturation locates after carbon 9, which are commonly called ω-9. PUFA contains more than one double bond in its backbone. The unfavourable effects of SFA on cardiovascular health are widely proven, as well as the positive effects of MUFA (Hernáez et al. 2017;Schwingshackl and Hoffmann 2014). Good human health requires a minimum quantity of these compounds in the diet, such as the essential FA linoleic (C18:2), ω-6, and linolenic (C18:3), this latter called ω-3. Linoleic acid is often present in vegetable oils, which may include also palmitic acid (C16:0), stearic acid (C18:0), and small quantities of other fatty acids as arachidonic (C20:0), and behenic (C22:0).
In practice, analysing all batches of foodstuff by the official methods currently applied is challenging for different reasons. Therefore, both industry and public administration can greatly benefit from the use of rapid technologies for nutritional labelling and verification. Near-infrared spectroscopy (NIRS) is a fast and non-destructive technique widely used in different industries such as food, pharmaceutical, petrochemical industries, amongst others (Flinn 2005;Norris 2005). NIRS is respectful of the environment since it does not need solvents or reagents (Cayuela and García 2017). The suitability of NIRS for quality and composition analysis of potato chips has been reported by several authors (Shiroma and Rodríguez-Solona 2009;Pedreschi et al. 2010;Suthatta et al. 2017). Fernández-Cabañas et al. (2011) report the determination of fatty acids profiles for nutritional labelling of pork dry-cured sausages. However, using NIRS for the specific purpose of food nutritional labelling is almost absent in the literature.
NIRS quantitative methods generally aim to correlate the acquired spectra with reference data. In the context of nutritional labelling, the latter are typically expressed as percentages, mgg −1 , ppm, mgkg −1 or analogous relative units. Focusing on percentage data, accounting for relative content across nutritional classes (parts of a composition), these live in the numerical space of positive values adding up to 100 that is formally known as the simplex. This fact characterises them as so-called compositional data (Aitchison 1986;Pawlowsky-Glahn et al. 2015) and their key feature is that they carry relative information which does not depend on an arbitrary scale of measurement. Compositional data are naturally constrained (percentages range between 0 and 100) and inter-dependent (increasing percent content in one part of the composition is necessarily compensated numerically by decreasing it in at least one other part). These special features make the direct use of conventional statistical and chemometrics methods prone to introduce distortions in the data analysis that may lead to deceiving results and scientific conclusions. These include negative bias in correlation measures, singularity of the covariance matrix, predictions outside the possible range, and results dependent on the measurement units (Aitchison 1986;Pawlowsky-Glahn et al. 2015). Aitchison (1986) introduced log-ratio analysis (LRA) as a consistent and well-principled statistical approach for the analysis of compositional data that deals with the above issues. This methodological approach has been further developed in the last decades and successfully introduced in varied scientific fields such as geochemistry, molecular biology or time-use epidemiology where analogous data formats are used.
In brief, the LRA approach focuses on (log-)ratios between the parts of the composition. Log-ratio transformations allow to represent the original (constrained) data on the simplex as (un-constrained) coordinates on the ordinary real space, where standard statistical methods and models can be applied. Once the analysis is conducted on coordinates, the results can be translated back in terms of the original composition by inverse log-ratio transformation. The compositional treatment of food nutritional data in NIRS-based chemometric calibration has only been considered recently (Cayuela-Sánchez et al. 2019;Cayuela-Sánchez et al. 2020) .
The present study entails the use of Vis/NIR spectroscopy together with a tailored compositional statistical treatment of reference data for the estimation of fat components in the nutritional label of potato chips products based on fragmented and ground preparations. In the following sections, we provide details about the study setting, sample collection, NIRS processing, reference analysis, chemometric calibration, results and model performance, and conclude with some final remarks.

Study samples
The nutritional composition of potato chips products depends much more on the oils used in frying than on the raw material itself. Therefore, from a certain number on, the samples are repetitive regarding the type of product. The potato chips samples used in this study were purchased from two local supermarkets in Seville (Spain) on two different dates. The range of frying oils showed on the labels of the collected samples was limited, including vegetable oils of regular sunflower, high oleic sunflower, olive oil, and 'seeds oil'. This latter category corresponds to non-explicit seed vegetable oils, according to the technical sanitary regulations for edible vegetable oils for the Spanish market (Presidencia del Gobierno 1893). Considering the above, the samples used were deemed representative of the diversity amongst the most common commercial types of potato chips available in the local market. The final set comprised fifteen potato chips samples from different Spanish manufacturers. Two replicates of each of them were processed, hence leading to a final reference data set consisting of 30 sample replicates.

Spectra acquisition
The spectra from the sample replicates were registered with a spectrometer Labspec (Analytical Spectral Devices Inc., Boulder, USA). Labspec has separate detectors for different wavelength ranges. The range 350-1000 nm, including a short interval in the ultraviolet (350-380 nm) and the visible (400-780 nm) ranges, as well as a part of the nearinfrared (NIR) wavelengths (781-1000 nm), was analysed by a fixed reflective holographic diode array with a sensitivity of 512 pixels. A holographic fast scanner InGaAs detector cooled at − 25 °C covered the NIR wavelength range of 1000-1800 nm. The same detector coupled with a high order blocking filter analysed the NIR interval 1800-2500 nm. The spectrometer had an internal shutter and automatic offsets correction. The scanning speed was 100 ms, and the repeatability was 6.00 10 −4 cm −1 mol −1 (standard deviation on the average absorbance, between 350 and 2500 nm, of five measures of a white tile).
The spectra were registered in reflectance mode. Two alternative settings were prepared: (1) fragmented and (2) ground chips products. For this, the samples were firstly fragmented to slightly homogenize its size and shape and then two replicate scans from each sample replicate were registered. Afterwards, the samples were ground into powder by a manual mortar and the corresponding spectra from two scans were registered in the same manner.
Registration of spectra was conducted by using a 'Sample Turn Table' accessory (Analytical Spectral Devices Inc., Boulder, USA) with standard SMA 905 optical fibre connectors, while the samples were held in a quartz Petri dish. The software of the spectrometer was set up to register 10 scans with continuous acquisition to automatically averaging and form a single spectrum. The registering process was controlled by Indico Pro software (Analytical Spectral Devices Inc., Boulder, USA) with a spectral resolution of 1 nm. Two replicate spectra from each sample were registered. The registering process was the same for both fragmented and ground preparations and it took less than a minute for each chip product, all steps included.

Reference analysis
Reference analysis was the same for the fragmented and ground preparations and was applied separately to the 30 final sample replicates in each case (15 samples × 2 replicates each).

Fat content
The extraction of the oil content of each ground replicate was conducted by the Soxhlet method, using hexane as the solvent and paper cartridges of diameter 35 mm and length 75 mm. The extraction time was 5.5 h, using 250 mL Soxhlet extractor units. Vacuum extraction on a rotary evaporator removed the hexane remaining in the fat. A water bath at 40 °C and tap water circulating through the refrigerant circuit allowed a suitable temperature gradient. Fat content was measured gravimetrically by weighing the flasks containing each sample once at room temperature. It was expressed as total fat percentage (TF) of the product. Then, each fat sample was held in 4 mL vials, hermetically closed, and stored at 4 °C in preparation for FA composition analysis. This latter was performed within a week after the extraction of the fat samples.

Fatty acids composition
The FA compositions were analysed by gas chromatography (GC) according to the IUPAC Standard Method (IUPAC 1987) as fatty acid methyl esters (FAME). An aliquot of 100 to 130 mg of the sample of fat from the potato chips was dissolved in 2 mL heptane. Then, the sample was trans-esterified using 500 μL methanolic potassium hydroxide 2 N. The supernatant was collected after decanting and hosted in 1.5 mL vials for analysis. GC analysis was performed using an Agilent 7697A gas chromatograph (Agilent Technologies, Santa Clara, USA). A capillary column was employed: poly (90% biscyanopropyl-10% cyanopropylphenyl) siloxane, 60 mÅ, 0.25 mm Φi, and 0.20 μm film thickness. A flame ionization detector (FID) with automatic split injection was used (injection volume equal to 1 μL). Hydrogen was the carrier gas at a flow rate of 1 mL min -1 . The injector and detector were fixed at a temperature of 225 °C and 250 °C respectively. The oven was programmed at a beginning temperature of 180 °C (10 min), with a rate of increase of 3 °C min -1 up to 220 °C (10 min).

Chemometric pre-processing
The replicate spectra from each sample replicate were firstly averaged, then, the wavelength resolution of the spectra was reduced to a window width of 8 nm. This procedure was the same for fragmented and ground samples. The resulting reflectance data were normalized by applying standard mean normalization. Subsequently, they were transformed into absorbance and treated by the Savitzsky-Golay's first derivative method with polynomial order 2 and smoothing point 3. The adequacy of these treatments has been previously reported (Cayuela et al. 2015). Spectral data pre-processing was carried out using The Unscrambler 9.7 (CAMO Software AS, Norway).
A preliminary assessment of possible patterns and discrepancies in the Vis/NIR spectral data was conducted by principal component analysis (PCA). PCA reduces the spectral variables by finding optimal linear combinations (principal components) of them that successively account for decreasing amounts of total variability in the original data. In particular, the values of the first two principal components (PCA scores, accounting for the highest portion of the original data variability) were used to represent the spectral data as points in an ordinary 2-dimensional scatterplot (PCA scores plot). This included a 95% confidence ellipse to help identifying any potential outlying spectral profiles.

Compositional calibration modelling
To maximize the use of the available data and provide more realistic measures of the calibration model performance, the data (for both fragmented and ground preparations separately) were sequentially split into train and validation data sets following an ordinary cross-validation procedure (see e.g. Kuhn and Johnson 2013). In particular, we considered tenfold cross-validation, so that the sample replicates were randomly allocated to 10 equal size subsets and then nine of them were used to fit the model (training set) and the remaining one was used to test the performance of the model independently (validation set). This was repeated 10 times with each of the subsets used in turn exactly once as validation set. That is, at each of the 10 rounds, 27 sample replicates were used for training and 3 for validation.
Calibration models were fitted to estimate the basic 3-part fat composition (SFA, MUFA, and PUFA) and the TF percentage content separately from the processed Vis/ NIR spectral data. Following the LRA approach, the fat composition data were transformed into log-ratio coordinates. These were used as multivariate response in a partial least squares regression model (PLS2) fitted using the ordinary kernel algorithm. In particular, we considered isometric log-ratio (ILR) coordinates (Pawlowsky-Glahn et al. 2015), by which the 3-part composition was represented by two real-valued coordinates defined as follows: In this case, the first ILR coordinate (1) represents the balance or trade-off of SFA against PUFA and MUFA contents in the FA composition, whereas the second ILR coordinate (2) refers to the balance between PUFA and MUFA.
The total fat percentage (TF) was treated as a separate 2-part composition (TF, 100 − TF) , i.e. a composition formed by the percentage of TF along with the complementary percentage of the remaining contents, and thus represented as a single ILR coordinate given by Another univariate response partial least squares calibration model (PLS1) was then fitted to this ILR coordinate on the spectral data.
In brief, the PLS models projected the data onto low dimensions by finding a few linear combinations (PLS components) of the spectral data with the highest correlation to the ILR coordinates. The optimal number of PLS components for each model was determined internally by cross-validation based on the coefficient of determination (R 2 ) and root mean square error (RMSE), choosing the most parsimonious model amongst those reaching comparable best predictive performance according to the one standard error rule (Kuhn and Johnson 2013). These calculations were conducted using the packages compositions, caret and pls on the R system for statistical computing (R Core Team 2020).
The predictions from the fitted PLS models in ILR coordinates were extracted and transformed back onto the simplex by applying inverse ILR transformation and multiplying by 100 to express them in percentages as originally. Note that alternative ILR coordinate systems could be used representing alternative balances between the FA (1) components. However, they all are orthogonal rotations of each other and lead to the same final predicted compositions after ILR back-transformation. Hence, the choice of ILR coordinates is not a relevant question for our purpose here. Finally, the results for the class total unsaturated fatty acids (TUFA) were obtained by arithmetically adding the results for PUFA and MUFA.

Evaluation of calibration model performance
Model performance assessment was conducted on ILR coordinates as ordinarily with PLS calibration models based on R 2 and RMSE measures. Values based on the entire reference data sets (training data estimates) and values resulting from the cross-validation process (CV data estimates) were calculated. Additionally, compositional R 2 and metric standard deviation (MSD) were computed as counterparts of the above measures to evaluate the fit for the fat composition as a whole after inverse ILR transformation (Cayuela-Sánchez et al. 2019;Cayuela-Sánchez et al. 2020). Finally, as customary in similar studies in the literature, the correlations between predicted and reference values for the individual components and their percentage relative deviations were computed.

Potato chips products spectra
The ultraviolet (350-380 nm) and visible (380-780 nm) regions of the analysed spectra were not relevant, even though an absorbance maximum is showed around 360-380 nm (Fig. 1). They could, in part, represent aromatic compounds or, according to absorbance data described elsewhere (Taniguch and Lindsey 2018), may relate to compounds produced in the frying.
Fundamental vibrations, mainly involving carbon-hydrogen combinations, induce various overlapping bands in nearinfrared spectra because of their first and second overtones (Shenk et al. 2001). A large peak around 1900 nm corresponds to water. According to Salgó and Gergely (2012), carbohydrates peaks are found in three wavelength regions: (i) between 1585 and 1595 nm, (ii) from 2270 to 2280 nm, and (iii) from 2325 to 2335 nm. This is probably associated with overlapping of the C-H bonds vibration and their stretch and deformation. Water-soluble carbohydrates such as fructose, glucose and sucrose show distinct absorption bands around 2275 nm (Chung and Arnold 2000) due to combinations of O-H and C-C stretching (Osborne and Fearn 1986). Fat absorption bands in products from potatoes depend mainly on the plant oil, or animal fat, used in their elaboration. The potato chips products included in the present study had a fatty acid composition mainly associated with sunflower oil. Sunflower oil of high oleic acid content was present in some samples, as well as olive oil and a few cases of palm oil. Hourant et al. (2000) described main near-infrared absorption bands of vegetable oils. A broad absorbance band exists around 1220 nm, probably due to second overtones of C-H and CH = CH-stretching vibrations from oil. There is a high-intensity absorbance peak about 2300 nm, caused by a combination of fundamental vibrations from the C-H groups present in FA (Hourant et al. 2000). The bands

Descriptive statistics
Ordinary univariate summary statistics of the reference fat composition and total fat are shown in Table 1.
Note that using statistics such as geometric mean and standard deviation (or more generally compositional statistics that consider the fat composition as a whole; see e.g. Pawlowsky-Glahn et al. 2015) would be alternatives consistent with the relative scale of percentage data. However, ordinary (arithmetic) means and geometric means (not shown) were very similar in this case, and hence the most typical measures are displayed in Table 1 to facilitate comparability with previous literature. The percentages of total fat in the data set ranged between 31.12 and 39.83. As to the fat composition, SFA ranged from 7.22 to 14.92, MUFA was between 31.27 and 77.30, PUFA ranged from 8.31 to 59.05, and TUFA did so between 85.08 and 92.78. Total unsaturated fats represented the largest fraction of the reference fat composition (89.11% vs. 10.89% of SFA in mean) and also showed the highest variability (standard error of the mean equal to 3.60 and 3.36 for PUFA and MUFA respectively). These results agree with the fact that most of the potato chip products analysed used high oleic sunflower or normal sunflower oils in the frying process according to their labels.

Principal component analysis
Once registered, the spectra generated from both preparations (fragmented and ground) were run through PCA. Score plots based on the first and second principal components (PC1 and PC2) are shown in Figs. 2a (fragmented samples) and 2b (ground samples). The total data variability accounted for by these plots was 74.9% and 78.1% for fragmented and ground samples respectively. No groupings were obvious and only a few pairs of replicates showed a borderline profile, with the corresponding points laying around the boundaries of the 95% confidence limits. However, no clear outlying cases were apparent, and thus the entire data set was used in subsequent data processing.

Compositional PLS calibration modelling
The calibration models formulated in ILR coordinates as detailed in Eqs.
(1)-(2) for fat composition were fit on spectral data from fragmented and ground preparations  separately. The optimal numbers of PLS components were 9 and 14 for the fragmented and ground cases respectively. Figures 3 and 4 show the respective standardised PLS regression coefficients and scatterplot of predicted against actual values from the fitted PLS2 models. Note that interpretation of these coefficients cannot be done in terms of the individual fat components, but in relation to the contrast between components represented by the ILR coordinates.
In any case, we here fundamentally use these models as a device for prediction. In this sense, we observe that the scatterplots of predicted versus actual values show good agreement. The corresponding estimates from the calibration models for total fat based on fragmented and ground samples are shown in Figs. 5 and 6 (using ILR coordinate given in Eq. (3); optimal numbers of PLS components were 9 and 4 respectively). Good agreement between predicted and actual values is also observed here, particularly in the fragmented case.
Note that the total fat replicates of one reference sample were identified as highly influential in the PLS regression fit when comparing actual and predicted values using the Cook's distance method (Cook 1977). These values most probably corresponded to defective reference analysis. Omitting them notably improved the fit of the total fat models, increasing R 2 from 0.514 to 0.967 for fragmented samples and from 0.539 to 0.800 for ground samples.
The performance statistics based on train and crossvalidated data for all the models above are summarised in Table 2.
Both fragmented and ground preparations provided similar calibration results, with train R 2 values over 0.9 for all ILR coordinates except for ILR TF in the ground preparation. The corresponding cross-validated estimates also reached values regarded acceptable. The lowest performance was observed for total fat. This might be related to the Soxhlet reference method, which presented a larger deviation between replicates than gas chromatography. Additionally, the RMSEs of the models can be compared with the dispersion (standard deviation) observed in the reference data as baseline to assess any improvement in the error of prediction (all consistently computed in ILR coordinates). The    (2) and (3) respectively. These were then markedly larger than the corresponding RMSE values of the models (either based on train or CV data as reported in Table 2) and further supported the good performance of the fitted calibration models.

Estimation of fat composition
Estimates of (SFA, MUFA, PUFA) expressed in percentage based on the Vis/NIR spectral data were obtained from the fitted calibration models by inverse ILR transformation. The compositional performance measures are summarised in Table 3. Cross-validated coefficients of correlation between actual and predicted values for each component (r cv ) along with percentage relative deviations (RD cv ), as commonly used in component-wise assessment, are also displayed in Table 3.
As reported above based on ILR coordinates, the results from the ground preparation were better than those from fragmented preparation. The predictions of MUFA and PUFA using ground samples were satisfactory, with r cv equal to 0.948 and 0.946 respectively. However, the estimation of SFA was more problematic, with a moderate r cv equal to 0.656. This might be in part related with the limited size of the reference data set, which affects the partition into train and validation sets in the cross-validation procedure. Using more data would contribute to reduce the variability between cross-validation runs. Moreover, note that these individual measures of performance are not actually very adequate for compositional variables as they ignore the intrinsic interplay between the parts of the composition as fractions of 100. For example, the correlations coincide by construction for SFA and TUFA. Recall that the basic percentage composition consists of SFA, MUFA and PUFA; therefore, SFA is 100 -(MUFA + PUFA) and then SFA is just 100 -TUFA. This linear relationship implies the correlation values will be always the same for both components. This stresses again the pitfalls of considering measures on the individual parts of a composition alone, and the convenience of using compositional measures to summarise the data and assess model performance. With this regard, the ground samples provided compositional R 2 and R 2 CV equal to 0.993 and 0.879 respectively, with associated low deviations in prediction as pointed out by the MSD measures (Table 3). These same statistics for the fragmented samples were 0.952 and 0.820. As a compositional counterpart to the common scatterplots of observed versus predicted values, Fig. 7 shows these for the entire percentage fat compositions using ternary plots (for fragmented samples on the left-hand side and ground samples on the right-hand side).

Conclusions
This study adds further evidence about the feasibility of using Vis/NIR spectroscopy for the assessment of nutritional contents in nutritional labels of commercial foodstuff products. Importantly, the method integrates a tailored statistical treatment of compositional reference data that avoids possible incoherencies associated with ordinary statistical procedures. Despite the somewhat limited number of samples available in this study to train the calibration models, these already showed acceptable ability to estimate the relative contents of fat classes (SFA, MUFA, and PUFA) in potato chips products according to common model performance measures used in the field, based on both train and crossvalidation sets. The results for ground potato chips samples were generally somewhat better than for fragmented samples. In particular, the compositional R 2 and R 2 CV (from train and cross-validation fat composition data respectively) were 0.993 and 0.879 for ground samples and 0.952 and 0.820 for fragmented samples. This demonstrates that the proposed method is a practicable alternative to current official methods and shows its potential to be used for nutritional labelling and verification in a rapid and inexpensive manner. In any case, this method will require the calibration of specific instruments as well as a periodic validation protocol, as it the case for NIR spectroscopy techniques in general.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.