The analysis of lubricant evidence is a recent development in sexual assault investigations and in the absence of any biological evidence may assist in linking an assailant to the victim or crime scene. An ambient ionization technique, high-resolution direct analysis in real-time mass spectrometry (HR-DART-MS), was employed to characterize a sample set of 33 water-based lubricants. As lubricants are complex multicomponent mixtures, this study investigated if different thermal desorption temperatures could elucidate different additives and provide additional information. A low-temperature, high-temperature, and thermal desorption/pyrolysis DART-MS protocol was used to characterize the water-based lubricant sample set. The strength of the methodologies was evaluated using positive and negative likelihood ratios that were calculated from inter- and intra-pairwise comparisons using Pearson correlation coefficients. The low-temperature DART-MS protocol afforded valuable information pertaining to volatile additives (e.g., flavors and fragrances) and provided positive likelihood ratios that would provide strong support for true positive and negatives than the high-temperature protocol when associating between individual samples and samples to their respective sub-groupings. The thermal desorption/pyrolysis DART analytical protocol provided enhanced differentiation between samples due to the precise temperature control using a thermal gradient. Moreover, the total ion spectra obtained from the thermal desorption/pyrolysis protocol, not only had high positive and negative likelihood ratios, this method also provided the most discrimination as determined by empirical cross entropy plots.
Forensic sexual lubricant analysis is currently based on the identification of the main lubricant component which, depending on the type of sexual lubricant, could be a polydimethylsiloxane (PDMS) derivative, a natural hydrocarbon, or a water-soluble component such as polyethylene glycol (PEG), propylene glycol (PG), or glycerol. The identification of these major components has been well established through the use of Fourier transform infrared (FTIR) spectroscopy  or gas chromatography-mass spectrometry (GC-MS) [1,2,3]. In instances where the analyst is only concerned with determining if a lubricant is present on a sexual assault swab, as indicated by the presence or absence of these major constituents, then these techniques and protocols are sufficient. However, since sexual lubricants are complex multicomponent mixtures, many of the forensically relevant minor components and additives, i.e., fragrances, flavors, sensations, and anesthetics, are not in significant concentrations to be identified by FTIR. Therefore, the ability to determine if two samples may come from a common source or not may be more difficult than assigning an unknown sample to a specific chemical classification [4,5,6].
It has been established that analyzing lubricants using high-resolution direct analysis in real-time mass spectrometry (HR-DART-MS) can provide discriminating information that can classify samples into more specific groups other than just their main marketing classes (e.g., water- and silicone-based personal lubricants). The analysis of lubricants using ambient ionization techniques has increased in recent years, primarily due to the fact that it does not require chromatographic separation and most of the minor peaks can be observed without being masked by the primary lubricant peak(s). This advantage is due to the high-resolution-accurate mass time-of-flight mass spectrometer (HR-TOFMS), which separates molecular components based on their mass yielding individual peaks for individual components. Unlike chromatographic separation, which can lead to co-coelution of peaks if the separation parameters are not sufficient for full separation, HR-MS does not have coelution problems; however, it does not discriminate between isomers. Musah et al. demonstrated that by using HR-DART-MS they were able to identify both trace condom lubricant components and fatty acids from latent fingermarks present on the condom wrapper . An additional study by Proni et al. looked at detecting the spermicide nonoxynol-9 (N-9) in samples collected post-coitus as an indicator for the use of a condom during a sexual assault . They could rapidly detect N-9 up to 1 h after intercourse. While both studies demonstrated that N-9 can be detected easily, the use of the spermicide has greatly diminished in recent years. The decline of spermicidal use in condom lubricants is due to the potential increased uptake of HIV and other sexually transmitted diseases [9,10,11]. As a result, other lubricant components should be utilized as chemical markers for lubricant analysis.
The analysis of lubricants has also been performed by desorption electrospray ionization mass spectrometry. In this research, Mirabelli et al. was able to utilize DESI-MS to identify several additives in condom lubricants (e.g., octylamine and methylmorpholine) as well as the main polymeric component such as PDMS, PEG, and N-9 . Several studies by this research group have examined if it is possible to identify unique chemical markers that can further differentiate between sexual lubricants beyond simply their main marketing types. It has been demonstrated that silicone-based lubricants can be classified into smaller groups based on the range of PDMS oligomers , and water-based lubricants can be classified based on additives and other manufacturer-specific compounds such as anesthetics, flavors, and sensation enhancers [5, 13]. In these studies, linear discriminant analysis using a test-set validation approach resulted in accurate classifications for both silicone lubricant and water-based lubricant classes identified [4, 5, 13]. These sample sets were analyzed at a relatively high desorption temperature of 350 °C, which resulted in spectra primarily comprised of the main lubricant components (e.g., PDMS, PEG, and glycerol) [4, 5, 13]. Interestingly, in the water-based lubricant sample set, several flavored and fragranced glycerol-based lubricants were grouped together based on the similarity of their spectral profile obtained using a 350 °C HR-DART-MS ionization temperature. At this ionization temperature, many of the flavonoids and fragrances may have rapidly volatilized before they could be ionized and enter the MS. As a result, the contribution of low boiling point additives, such as flavors and fragrances, in the spectra may have been minimal. Most of the current literature on the analysis of lubricants has not evaluated the impact that low boiling point additives in lubricants has on the ability to discriminate these samples or determine if they may have come from a common source.
Fragrances have several physical characteristics that affect how the compound emanates from the solution, or lubricant in this case. Such characteristics include the individual compound’s boiling point (BP), saturated vapor pressure, polarity, diffusivity, and the substantivity in the mixture . While the analysis of lubricants can be performed by collecting the natural headspace above the sample, this analytical process is not feasible due to many time constraints in a forensic laboratory. Lubricant analysis by HR-DART-MS is typically performed at elevated temperatures in order to desorb the viscous slip agent (i.e., glycerol, PEG, and PDMS) from the sampling medium. However, most fragrances naturally evaporate at temperatures much lower than 350 °C.
One way to determine if two samples may come from a common source based on instrumental data is through a numerical comparison score, such as a hit quality index or a correlation score. Correlation values close to 1 indicate that the samples are similar while values close to zero (0) indicate that the samples are significantly different or uncorrelated. Receiver operating characteristic (ROC) plots are one method that has been used to determine the diagnostic ability of a binary classification system, i.e., common source or different sources. If the area under the curve (AUC) is closer to 1.0, the classification system is nearly perfect because all of the pairwise comparisons from a common source will have a higher correlation score than any pairwise comparison of two samples from different sources. However, if the value is closer to 0.5, then the binary classification system is random at best. While this methodology is an illustration of the entire system, it does not aid in identifying an appropriate threshold that should be selected, which can be difficult when determining if samples come from a common source or not. Since there is a potential of false positive and false negatives for each binary dataset, it is then up to the analyst to determine how much error is included in their final binary classification system when deciding the optimal threshold. The Youden index (J) has been used to aid in addressing what a good threshold value should be [15, 16]. This index maximizes the vertical distance between the line of equality (where the AUC equals 0.5) and the point of interest [x,y] at each PCC value. This distance also indicates that this is the point closest to [0,1] where the sensitivity is y = 1 and 1-specificity is x = 0 (i.e., perfect classification). The Youden index can be calculated relatively easily, J = max [TPR + TPR}. In a continuous system, like the ROC plots, the index can be calculated at each point in the system and the [x,y] point that corresponds to the highest J value is the optimal threshold that maximizes the correct classification rate.
Unfortunately, to make a determination if two samples have come from a common source, neither the PCC value nor the PCC threshold may be sufficient in a final report, and as such, an additional support may be necessary to describe how unique this pairwise comparison is against a database. Likelihood ratios (LRs) have been used in forensic analysis to provide an objective and numerical measure for evaluating forensic evidence and the comparison of forensic evidence . Early research by Koons and Buscaglia evaluated the ability to discriminate glass fragments by calculating type I and type II errors for the data set . From here, the use of likelihood ratios has evolved to evaluate the comparison and discrimination of glass , fingerprints [20, 21], smokeless powders , and explosives . Each of these research teams calculated the likelihood ratio by determining the intra- and inter-sample comparison values.
This study intends to assess the probative value of the temperature-dependent mass spectral results when analyzing sexual lubricants. The preliminary focus of this research was to characterize water-based sexual lubricants, as they generally have a more chemically diverse composition, containing numerous additives including flavonoids and fragrances. A subset of water-based lubricants was analyzed at different ionization temperatures to determine which protocol produces more accurate associations between an unknown questioned lubricant and a potential source. This evaluation will be based on the emphasized volatile additive profiles in the lubricant mixture. Three types of temperature profiles were used in this study: a low-temperature (150 °C) DART-TOFMS, a high-temperature (350 °C) DART-TOFMS, and a thermal desorption HR-DART-TOFMS protocol. Thermal desorption/pyrolysis HR-DART-MS was included in this study, to investigate if a temperature-dependent DART protocol would increase the presence of the lighter odorants in the resulting spectrum instead of using a fixed temperature . This study was not intended to be a comparison of analytical protocols, but an assessment of the probative value that each ionization temperature provides when analyzing sexual lubricants, thus allowing the laboratory to understand the limitations of each protocol when analyzing these samples.
Materials and Methods
The samples used in this study are the same 33 water-based lubricants (Table 1) that were utilized in the previous study . This allows for a direct comparison of the impact the ionization temperature has on the resulting profiles and how that affects the ability to accurately classify the sample. The lubricants were purchased through Amazon and eBay online venders. Research-grade helium and nitrogen gas cylinders were purchased through NexAir (Memphis, TN, USA). The mass calibration standard, PEG with an average molecular weight of 600 (PEG 600), was purchased from Sigma-Aldrich (St. Louis, MO, USA).
Traditional HR-DART-MS Analytical Method
Positive ionization mass spectra were acquired using a JEOL (Tokyo, Japan) AccuTOF™ mass spectrometer (JMS-4000LC) coupled with an IonSense (Peabody, MA, USA) DART® ionization source. Helium was used as the ionization gas for the DART source, and the gas flow rate was approximately 3.6 L/min. The gas temperature was set to 150 °C for low-temperature analysis, and 350 °C for normal analysis, i.e., high temperature. The needle electrode potential was held at 2 kV, and the exit grid voltage was set to 250 V. Orifice 1, orifice 2, and the ring lens voltages were set to the 20 V, 5 V, and 5 V, respectively, to generate molecular ions. The RF ion guide voltage was set to 600 V to detect ions that had a mass-to-charge ratio greater than m/z 60. Mass spectra profiles from m/z 60 to 1000 were acquired at a sampling and recording interval of 0.25 ns and 1 s, respectively.
Neat lubricants were sampled by dipping the closed end of a capillary tube into a small amount of the sample and positioning the sample in the sample gap, between the DART ionization source and the MS inlet. The ionization source was set 1 cm from the MS inlet, and the sample was positioned approximately 1 mm from the ceramic cap of the ionization source to maximize the interactions between the sample molecule and the ionizing metastable species. The sample was waved in the ionization stream for 5 s to maximize sample desorption. All samples were analyzed by collecting a background spectrum of the atmospheric background and the blank capillary tube for 30 s. An internal mass calibration standards (PEG600) was analyzed in each acquisition run to compensate for mass drift. Five averaged replicates of five measurements were collected per lubricant sample, totaling 25 measurements collected in each sample acquisition. An analysis time of approximately 8 min was found to be sufficient for the collection of all 25 measurements for both the high and the low HR-DART-MS analysis. With five replicates of each sample, there were 13,530 pairwise comparisons.
Thermal Desorption/Pyrolysis HR-DART-MS Analytical Method
The lubricant samples were analyzed in duplicate using the temperature gradient system ionRocket (BioChromato, San Diego, CA, USA) which was interfaced to the HR-DART-MS. Approximately 2 μL (i.e., enough to completely fill the sample pot) of the neat lubricant was pipetted into the copper sampling “pot,” which was then placed onto the heating block and positioned in the sample gap. The temperature program was held at room temperature for 1 min, followed by a ramp rate of 100 °C/min to a final temperature of 600 °C. The final temperature was held for an additional 1 min, for a total analytical time of 8 min. A glass T-junction was positioned over the sample mounted on the copper sampling stage and extended from the ionization source to the MS inlet, as depicted in Fig. 1. When the sample components desorb from the copper “pot,” they are directed into the glass T-junction, where they are then ionized in the sample gap and subsequently entered into the MS inlet. Careful attention was paid to ensure that the ionRocket and the mass spectrometer were started at the same time to minimize any differences in time variability from sample to sample. Each lubricant was analyzed in duplicate in the positive ionization mode using a gas heater temperature of 550 °C. The remaining mass spectrometric parameters were the same as the traditional HR-DART-MS analysis. PEG with an average molecular weight of 600 was characterized daily prior to sample analysis. With duplicate analysis of the 33 samples, 2145 pairwise comparisons could be made.
All data acquisition was performed using JEOL MassCenter. TSS Unity v. 22.214.171.124 (Shrader Software Solutions, Inc., Detroit, MI, USA) was utilized to generate mass calibrated, spectrally averaged (over five measurements), background subtracted mass spectral data for the traditional HR-DART-MS data. With the data collected from the ionRocket, a total ion spectrum (TIS) was generated for each sample by spectrally averaging over the analytical run (0–8.0 min). Individual data matrices (i.e., 150 °C, 350 °C, and TIS) were prepared by binning the mass spectral data using a ± 3 millimass unit (mmu) tolerance and a 10% relative intensity threshold. Base peak chronograms (BPCs) were generated from the ionRocket using the M/Zmine 2.32 software . A three-dimensional (3D) data matrix was created for the BPCs, with time as a function of the ion count.
Pearson correlation coefficients (PCCs) were calculated for each pairwise comparison in the individual data matrices, which enabled intra-sample and inter-sample comparisons to be conducted. Furthermore, the information afforded the ability to calculate type I and type II errors which would then be used to generate ROC plots. These ROC curves are based on a binary decision system where each pairwise comparison value (i.e., PCC) is above or below a pre-set threshold value. If the pairwise comparison value is higher than the threshold value, the pair is considered a true positive, or in the study the two samples come from a similar source. In opposition, if the pairwise comparison PCC is lower than the threshold value, then the samples are considered to have come from different sources (i.e., a true negative). The AUC was calculated for each sample set and was an indication of the likelihood of making a correct identification before an incorrect identification. These plots were then used to assess how well each of the five discrete data models (i.e., 150 °C, 350 °C, BPC, TIS, and BPC-TIS) predicts the classes the best. Additionally, the type I and type II errors were used to calculate the positive and negative likelihood ratios.
In order to provide the analysts with an interpretation of the PCC score obtained when comparing an unknown lubricant sample to a potential source(s), likelihood ratios were evaluated. As an example, if the analyst had an unknown and a known sample that had a final correlation coefficient of 0.7, it would be difficult to ascertain if the two samples would be considered as coming from a common source or not. To determine the probability that the PCC value obtained will be observed for a true positive result (i.e., correlated pairwise comparison with a high PCC value) versus a false positive result (i.e., uncorrelated pairwise comparison with a high PCC value), a positive likelihood ratio (PLR) could be calculated. The PLR is the sensitivity value (i.e., true positive rate) divided by the 1-specificity value (i.e., false positive rate), refer to Eq. (1) [26, 27]. It is also necessary to determine the probability of this PCC score for a false negative result versus the same PCC for a true negative result. To determine this, the negative likelihood ratio (NLR) can be calculated by dividing the 1-sensitivity value (i.e., false negative) by the specificity value (i.e., true negative), refer to Eq. (2).
The thermal desorption/pyrolysis HR-DART-MS data was processed differently as the data generated was in a 3D array, consisting of m/z, time (minutes) which is also temperature dependent, and the absolute ion intensity at each m/z of the individual chronogram time period. An example of the complexity of the data is presented in Figure 2 A and C, where the traces of the individual components in the lubricant, as indicated by a different m/z, have different profiles over the thermal period. It is also easily observed how different the individual trace profiles are between two similar lubricants with different flavor/fragrance profiles. Comparisons between lubricant samples were conducted using three methods: (1) the BPC, (2) an averaged TIS profile, and (3) a fused BPC/TIS profile. The BPCs reflects the change in the base peak across the entire temperature profile. This chronogram is generated by examining the change in the base peak within the mass spectral data acquired at every time interval over the entire 8-min analysis period . As an example, in Figure 2 A, right side, there are eight changes in the base peak observed in the mass spectral data over the analysis period. The BPC is generated by summing the ion count for the most intense base peak in any given mass spectrum as a function of time. BPC profiles are different for each sample, and the uniqueness increases based on the base peak changes observed for each sample. However, as the BPCs only reflect base peak changes as a function of time and does not contain mass spectral information, the TIS was also evaluated individually. The third aspect was to fuse the BPC profile and the TIS mass spectral data into one dataset to determine if increased differentiation of samples could be achieved.
Results and Discussion
Low Ionization Temperature Selection for HR-DART-MS
A preliminary study was initially conducted to ascertain the gas heater temperature that provided a spectral profile of the fragrances and other additives with minimal contribution from the main lubricant base. Two flavored lubricants and one unflavored sample was characterized at six different temperatures: 50, 100, 150, 200, 250, and 350 °C. The 2D plot of the WET Sweet Cherry-flavored lubricant analysis is presented in Figure 3, along with the total ion chronogram, in which each peak depicts when the sample was introduced into the ionization stream.
Several peaks are present at the lower temperatures (50–150 °C) that are not readily observed at higher temperatures. A few of these peaks that diminished as the temperature increased were ammoniated pentylene glycol [M+NH4]+ (m/z 122.1186), propyl paraben [M+H]+ (m/z 181.0841), ammoniated propyl paraben [M+NH4]+ (m/z 198.1105), triethyl citrate [M+H]+ (m/z 277.1274), and ammoniated triethyl citrate [M+NH4]+ (m/z 294.1522). As is observed from Figure 4, as the temperature increases (i.e., 350 °C) the relative intensity of the glycerol peak (i.e., lubricant base) increases, while the relative intensity of the other additive peaks diminishes.
However, for complex multicomponent samples like lubricants, while the analyst may be able to discern the main marketing group (i.e., lubricant base) at higher ionization temperatures, the ability to differentiate within these marketing types may be more difficult with the loss of the additives. Consequently, different desorption temperatures, in one analytical run, may provide additional information by elucidating and emphasizing certain components at different time points during the analysis. This was the rationale behind including the thermal desorption/pyrolysis attachment for DART-MS, such that the lubricant could be chemically interrogated over an extended temperature range.
DART Analysis at 350 °C
Most of the lubricant samples analyzed at the high ionization temperature–generated spectra that were dominated by various glycerol adducts in the positive ionization mode: [M+H]+ (m/z 93.0547), [M+H-H2O]+ (m/z 75.0436), [M+NH4]+ (m/z 110.0820), [2M+H-2H2O]+ (m/z 149.0826), and [2M+H]+ (m/z 185.1023). Other samples had base peaks attributed to ethoxydiglycol, propylene glycol, phenoxyethanol, lidocaine, and butylene glycol .
Hierarchal cluster analysis (HCA) is an unsupervised technique which generates groupings in the sample set based on similarities or dissimilarities in the DART mass spectral data acquired. HCA was conducted on the data to identify the samples that grouped closely together (Table 2A). Interestingly, group 1 consisted of all the flavored lubricants, one sensation lubricant (Jo H2O Warming), and glycerol-based water lubricants. These samples grouped together primarily because these spectra were dominated by peak characteristics for glycerol. The remainder of the samples were grouped differently based on the presence of additional components and additives providing a major contribution to the resulting spectra (Table 2A). A brief overview of the lubricant characterization performed at the high desorption temperature is presented here; however, a detailed examination of these results has been previously described . Pearson correlation coefficient values were calculated for each of 13,530 inter-sample pairwise comparisons to determine inter- and intra-sample PCC values are in the dataset. The intra-sample correlation values averaged 0.962 with a standard deviation of 0.065 and a median of 0.980. The inter-sample correlation values had an average of 0.268 ± 0.407; the median value was 0.074. A correlation plot depicting the similarity (dark blue dots) and dissimilarity (light blue or red dots) is presented in Figure 5. Several samples had high intra-sample correlations without any correlation with the remaining lubricant samples, supporting the HCA clustering. Since several samples had multiple bottles analyzed, intra-sample correlations included the samples from the same manufacturer and type, just a different bottle. Statistically, the correlation values between different bottles were the same as the intra-bottle correlation values and thus considered the same. Additionally, in a forensic setting, it would not be necessary to associate the unknown sample to a specific bottle, just to a specific class, manufacturer, or type.
The AUC of the ROC plot for the 350 °C samples yielded a value of 0.8652 for inter-sample comparison (Table 3, Figure 6). This value indicates that good discrimination can be achieved. In an operational setting, it is difficult to determine if an unknown sample may come from a common source or a different source from the known sample by just using an arbitrary PCC value. While most people would agree that a PCC = 0.95 is highly correlated, it is not sufficient for this to be the threshold of a common source determination without evaluating the entire system. One manner to determine the optimal threshold using ROC plots is the Youden index (J). This measure of the optimal threshold has been determined to be the best manner in various situations [16, 28]. The optimal threshold for these samples is 0.92 (Table 3), which indicates if the PCC for any sample comparison is greater than 0.92, it should be a true positive (i.e., the two samples are very likely to have come from a common source). However, this is not sufficient information to support the final determination and likelihood ratios should be investigated to provide additional objective information for making this determination.
The PLRs and NLRs for these samples (e.g., sample and class pairwise comparison), at several correlation threshold, values are presented in Tables 3 and 4, respectively. When associating an unknown sample to either a known sample or an example of a class, it is necessary to understand how strong the LRs is for that comparison based on the calculated PCC. The PLRs for the 350 °C sample pairwise comparisons ranged from 3.92 to 8.66 when the correlation threshold for a true positive determination was 0.60 to 0.95, respectively. At the optimal threshold of 0.69, the PLR is approximately 4.45, which indicates that it is 4.45 times more likely that comparisons with a PCC > 0.7 will come from a common source (i.e., true positive). In the same vein, the probability that a sample comparison at this threshold was a potential true negative is relatively low at 0.022, indicating moderate support that the two samples are from two different sources. However, as the PCC value decreases, the support gets stronger that the two samples are from different sources. Referencing Martire’s paper that discusses the value of LRs and a corresponding verbal equivalent, based on the PLR, there would be weak to limited support that these two samples have a common source . The NLR would also offer moderate support that this comparison is a true negative. However, these results indicated that it is feasible to state that the two samples come from a common source, albeit the support is weak.
The PLRs and NLRs were calculated for the pairwise class comparisons to associate an unknown sample to a potential class. The intra-class correlation values averaged 0.918 ± 0.094 and a median of 0.949. The inter-class correlation values had an average of 0.079 ± 0.235; the median value was −0.061. There was a minimal overlap observed between Gaussian distributions curves for the intra-class and the inter-class samples based on their correlation values. This is also evident based on the high AUC, 0.909, for the associated ROC plot, which is higher than the sample pairwise comparisons. However, the optimal threshold for this sample set was 0.69, which was much lower than the sample pairwise comparisons. This is likely due to the fact that samples in the same class may come from different manufacturers and inherently have lower PCC than sample comparisons. The first PLR value that could be calculated had a PCC threshold of 0.70. This threshold was significantly lower than the sample comparison scale because there were no false positives (1-specificity) observed above a PCC greater than 0.75. However, the PLR at a PCC of 0.70 was high at 592.63. This means that there is approximately 593 times more likely for the two samples to be of the same class instead of from two different classes, thus offering moderately strong support for this position. However, at 0.65, the PLR value drops to 50.06 (moderate support). Therefore, when determining if an unknown sample comes from a particular class, any sample with a PCC value greater than 0.7 would have strong support for accurate classification. Additionally, this value is greater than the optimal threshold of 0.69, which gives additional support that any PCC value greater than 0.69 will have strong support that two samples are from the same class. Conversely, at a PCC < 0.70, the NLR is 0.033 and less, and as such would have moderate to moderately strong support that two samples may be from different sources.
DART Analysis at 150 °C
While the dengrogram obtained from the high-temperature DART data revealed six distinct groupings, eight groupings were observed for the low-temperature DART data when using a similar distance metric (Table 2B). At 350 °C, glycerol was the sole component that grouped 15 different samples together; at the lower temperature, this group was separated into three different groups (e.g., groups 1, 4, and 8). The WET flavored samples separated from the group based on the presence of maltol, triethyl citrate, and propylparaben were in the group 4 samples, and maltol, ethyl maltol, and pentylene glycol in group 8. The other four groups in the 150 °C sample set were the same as those in the 350 °C; however, more components were observed at the lower ionization temperature.
Using the eight defined groupings, the AUC of the ROC plot for individual sample comparisons was 0.983, and for class comparison, the AUC was 0.989. At the lower DART ionization temperature, it was apparent that the lower temperature yielded additional peaks that may further differentiate samples. An example of two flavored lubricants is provided in Figure 7. At the 350 °C ionization temperature, both WET flavored lubricants have a similar mass spectrum, which justified their similar correlation values. At the lower desorption temperature of 150 °C, the differences between the two flavored samples are immediately noticeable, and these samples could be distinguished based on various additives, possibly attributed to the differences in their flavor profiles. Additional examples of the differences between the spectra at 350 °C and 150 °C are presented in the appendix (Fig. S.2). The correlation values indicate that there are actual differences between the WET flavored samples at 150 °C (Figure 5, lower right, black triangle) in comparison to the correlation values of the WET samples analyzed at 350 °C (Figure 5, lower left, black triangle). The flavored lubricants that had a similar mass spectral profile at 350 °C were differentiated with high intra-sample correlation values and low inter-sample correlation values when analyzed at 150 °C.
The intra-sample comparisons had an average correlation value of 0.961 with a standard deviation of 0.055 and a median of 0.980. The inter-sample correlation average was significantly lower at 0.146 ± 0.286; the median value was − 0.018. The ability to accurately associate samples is strong based on the high AUC of the ROC curve as well as the likelihood ratio values at each correlation threshold value. The AUC for the 150 °C individual samples was 0.983, significantly higher than the AUC at 350 °C at the optimal threshold 0.84 (Table 3). In determining if the two samples are potentially from the same class, the AUC at 150 °C increased to 0.99. The PLRs for the inter-sample pairwise comparisons at 150 °C sample pairwise comparisons ranged from 9.28 to 80.82 when the correlation threshold for a true positive match determination was 0.60 to 0.95, respectively. If a pairwise comparison had a PCC value of 0.90 and if two samples were correlated and from a common source (i.e., true positive), the strength of that assertion would be 32.67 times more likely to come from that source vs any other. In the same vein, for this same sample comparison, it is 0.108 times likely that it is a true negative results or that these two samples were from different source. The NLR would also offer weak to limited support of this comparison being a true negative. However, these results indicated that it is feasible to state that the two samples come from a common source, albeit the support is weak. While the AUC, PLRs, and NLRs are higher for these samples, in comparison to the 350 °C dataset, the likelihood ratio indicates that there is now moderate support that two samples are of the same class if they have a PCC equal to or greater than 0.90, with an 80.82% likelihood, i.e., 80.82 times more likely to be observed if the two samples are from the same class than different classes. At a value of 0.108 at 0.90 PCC, the NLR still indicates that there is moderately weak support of these two samples being a true negative.
The intra-class comparisons had an average correlation value of 0.861 ± 0.148, median value of 0.900. The inter-class correlation average was significantly lower at 0.0.68 ± 0.178; the median value was − 0.024. The class-based determination generated higher AUC, PLRs and NLRs in comparison to the 150 °C inter-sample comparisons; however, the optimal PCC threshold was lower at 0.60 (Table 3). The intra-class comparisons had an average correlation value of 0.861 with a standard deviation of 0.148. The inter-class correlation average was significantly lower at 0.068 ± 0.178. The lowest intra-sample correlation value, 0.112, and the highest inter-sample correlation value was 0.781. The PLRs only ranged from 0.60 to 0.75 but their values were significantly higher than the 150 °C PLRs: from 102.26 to 3148.15, respectively. At a PCC of 0.75, it would be nearly 3000 times more likely to obtain this value if the samples were from the same class, providing strong support for the final determination. Despite the higher AUC, at 0.989, the NLR increased to 0.168 which is in the same range of the inter-sample pairwise comparisons. A PLR could not be calculated at thresholds higher than 0.75 because there were no false positives. A complete list of the components identified by analysis at a lower desorption temperature is provided in Table S.1.
Aside from the low FPR and FNR values, which indicate good separation between the Gaussian curves of correct pairwise comparisons and incorrect pairwise comparisons, an additional difference between the 350 °C data and the 150 °C data was in the increased number and relative intensity of ammoniated peaks at the lower temperature. Ammoniated adducts are commonly observed for polar analytes , and while these compounds are present in the profiles at both temperatures, it is not immediately clear why there is an increased abundance of ammoniated adducts at the low ionization temperature. However, this abundance increase is currently under investigation. Although the major lubricant base does not fully desorb until the ionization temperature reached a minimum of 300 °C, the lubricant base is still observed at the low temperature, which can aid in identifying the main marketing type of the lubricant while still being able to identify formulation specific components.
Thermal Desorption DART-MS Analysis
The thermal desorption/pyrolysis attachment (i.e., ionRocket) was a useful tool that enabled a more complete characterization of the lubricant profile through precise control of the temperature gradient. These results were compared with the discrimination ability of the samples using the low and high ionization temperatures of traditional DART-MS analysis. The trace chronograms depict the desorption of individual components present in the lubricant as the temperature rises, consequently increasing the potential of identifying some of the volatile additives that desorb at low temperatures in the resulting mass spectrum. The data from the ionRocket was analyzed in three different ways: base peak chronograms (BPC), total ion spectrum (TIS), and fusion of the BPC and TIS datasets.
Base Peak Chronogram (BPC) Analysis
By overlapping the individual component profiles together, a unique desorption profile of the lubricant can be created (i.e., BPC). It is important to note this is merely a simplistic chronogram and as such lacks any mass spectral information. The BPC example, shown in Figure 2A, had eight base peak changes during analysis, which was reproducible for the duplicate. To obtain the BPC, used in this dataset, the outermost profile of the all of the individual component profiles, when merged together, comprised the BPC. Correlation values were calculated for each BPC profile pairwise comparison, totaling 2145 pairwise comparisons. The intra-sample comparisons had an average correlation value of 0.901 with a standard deviation of 0.060; the median value is 0.923. The inter-sample correlation average was significantly lower at 0.634 ± 0.190 and a median of 0.656. The lowest intra-sample correlation value, 0.688, may have resulted from slight differences in time variability. The ROC curve for these samples had an AUC of 0.925 with an optimal threshold of PCC = 0.81. The PLRs for the BPC inter-sample pairwise comparisons ranged from 1.95 to 28.26 if the PCC value was 0.6 to 0.90, respectively. At a PCC of 0.90, this PLR was not as high as observed with the low ionization temperature, primarily because comparisons are made based on a shape of the BPC and not on the presence of individual peaks. However, the PLR was higher than the 350 °C ionization temperature indicating that the over BPC profile was more unique to each sample and as a result, more discriminatory. There is weak to moderate support for a true positive determination of two BPC profiles coming from a similar source. At the optimal threshold of 0.80, the PLR is significantly lower at 6.18. The NLRs yielded similar results, at a PCC of 0.95 the NLR was 0.852 and at a PCC of 0.7 the NLR was 0.016. These values would offer weak and moderate support of two samples being a true negative, respectively. Without strong support for either determination of the pairwise comparison, true positive or true negative, it would be difficult to make a determination in an operational setting at these PCC values.
Similar results were obtained for the class-based pairwise comparisons. The intra-class comparison had an average of 0.885 ± 0.063, and the median was 0.892. The inter-sample pairwise comparison values were lower but not significant. The average was 0.621 ± 0.186; median was 0.643. The AUC for the class comparison, 0.920, and the optimal threshold, 0.80, were approximately the same as the sample comparison values. It is possible that the similarity between these two manners of comparing samples was likely due to the similarity of BPC profiles for different samples. The PLR values were similar as well, and the range from 0.60 to 0.90 was 1.74 to 15.63. Regardless of the PCC obtained for a pairwise comparison, there is weak to moderate support for a true positive determination of two samples being from a similar class. The NLRs yielded similar values and support as the PLRs.
Total Ion Spectrum (TIS) Analysis
The TIS profiles of the thermal desorption/pyrolysis results generated more discriminatory data in both the sample and class comparisons. The TIS was an average across the entire profile of all of the components present at the different ionization temperatures. The average correlation value for the intra-sample comparisons was 0.963 ± 0.049 with a median value of 0.982. The PCC for the intra-sample comparisons was significantly lower at 0.133 ± 0.241; the median was 0.015. The AUC for the ROC plot was 0.998, which is a depiction of the very low overlap of inter- and intra-sample PCC values. The optimal PCC threshold is 0.80. The PLRs ranged from 13.76 to 274.85. If a PCC of 0.80 or greater is obtained for a sample pairwise comparison, the probability that two samples were correlated and come from a common source would be 44.83 times more likely, thus providing moderately strong support. The NLRs of the sample comparison would yield weak support at 0.95 and moderate support between PCCs 0.70 to 0.90 that the two samples could be from different sources. However, with moderately strong support from the PLR that two samples are from a common source, this could be strong indication for that determination. The HCA dendrogram showed that all of the samples were grouped individually from one another; however, samples from the same manufacturer/brand were clustered together. The heatmap/dendrogram (Fig. S.3) clearly delineated 16 different groups based on the TIS profiles.
The correlation values for the inter-class and intra-class groups were 0.932 ± 0.093 and 0.126 ± 0.231, respectively. Their associated medians were 0.971 and 0.014. The AUC of the ROC plot for the TIS was very similar between the sample measurements and the class measurement, but the optimal PCC threshold is lower at 0.65 (Table 3). Despite the lower optimal threshold, the high AUC still indicates high discrimination between true positive and true negatives; it is just that PCC value is lower since we are dealing with class pairwise comparisons. This indicated that the average TIS profile generated unique spectra that almost individualized all the samples in the dataset. The only samples that grouped together were in group 4, which contained Adam & Eve, WET Kiwi Strawberry, and WET Blueberry. Interestingly, KY Ultragel and Jo H2O Cooling lubricants could not be discriminated using both the low-and high-temperature DART-MS protocol; however, these lubricants could be clearly separated using the TIS, based on the identification of sorbitol in KY Ultragel. Similar PLR and NLR results were obtained for the class-based comparison as the sample comparisons. The BPC and TIS data was fused together to see if better LRs could be achieved when the two datasets were used for each sample. The results were very similar to the TIS data.
Evaluation of the Likelihood Ratios Performance for Each Dataset
Likelihood ratios are often used to determine the strength of the evidence; however, this ratio alone cannot stand on its own in a court of law. It is important to note that LRs are a part of a larger question that when the LR are multiplied by the prior probability of the case, one can determine the posterior probability . The fact finder determines the prior probabilities in a given case by evaluating all of the possible information except for the evidence . The strength of the evidence is provided to the fact finder by the forensic examiner when they report the LR. The posterior probabilities indicate the probability of an outcome. This equation can use both the PLR and the NLR to determine the posterior probability for both situations: (1) when the unknown sample and the known sample have a common source (i.e., PLR) and (2) when the unknown sample and the sample have different sources (i.e., NLR). To determine if the LRs calculated in this study are reliable, it is necessary to determine if the likelihood have good discrimination and are well calibrated by evaluating the ROC plots and empirical cross entropy (ECE) plots.
Discrimination of LR Systems
The AUC of the ROC plots can be used as an indicator of good discrimination in a binary system. When there is less overlap between the true positive and true negative PCC values, the system is a better discriminating system. The closer the AUC is to 1, the less overlap there is, and the more discriminating the system is. The AUC was very high, greater than 0.98 for the 150 °C, TIS, and BPC/TIS datasets. The 350 °C and the BPC datasets had AUCs less than 0.95; however, these values are still high indicating that these temperatures can accurate associate or differentiate samples.
Calibration of LR Systems
To measure the calibration of these likelihood ratios, both PLRs and NLRs, ECE plots were used. The methodology measure the performance of the computed evidential weight for the LRs (PLRs and NLRs) computed for the experimental set . Therefore, it is necessary to determine if the LR values are well-calibrated because a well-calibrated set of LRs will have a higher discriminating power based on the classifiers used and the stronger the support they will provide . ECE plots have three lines that are used to determine if the LR values are calibrated, refer to Figure 8. The black curve is the null, which represents the performance of classifier (i.e., PCC values) that always delivers a LR = 1. This is also known as the neutral system where the posterior probability equals the prior probability and the cross entropy is the prior probability . The red line is the ECE observed curve which indicates the LR values calculated by the classifier. The higher this curve is in the plot, the worse the system is because it needs more information to know the true hypothesis. What is expected is that this number will be low and very close to the blue curve. The blue curve is the ECE controlled curve, which represents the best calibrated system achieved by the classifiers at each value of the prior probability (x-axis). These values are achieved by using the pool adjacent violator algorithm.
To determine if the LR values are calibrated and discriminatory, we look to several features from the ECE plots. If the difference between the observed and the controlled curves is small, then the LR values are well calibrated, and if the ECE controlled curves are low (close to 0), then the LR values are well calibrated . In the 150 °C datasets, the difference between these two curves are low (Table 5), for both the sample and class-based LR values. Secondly, if the observed curve does not intersect the null curve, then that means the classifier will be informative for the entire range of the application, which indicates that this classifier is useful in nearly any application . In our example, the observed curve in both curves is within the null curve, indicating usefulness in any situation (the entire range of the prior log10odd). Thirdly, to determine its discriminating power, the lower the controlled curve is on the plot the better the classifier can discriminate which yields stronger support for the propositions.
All 10 of the datasets were well calibrated and considered to have high discriminatory power for the range of prior probabilities. All of the observed curves were contained within the null curve (Table 5A). The controlled curves were mostly less than 0.2 or 20% of the null (Table 5B). These numbers are low and as such are well calibrated. The controlled curves for each dataset was a line at 0, indicating that the best calibrated classifier needs very little amount of information to know the true value of the ground truth. However, the classifier (i.e., observed curve) is not as calibrated as the controlled curve because it needs a little more information than the controlled curve, as indicated in Table 5C. If the prior probability is P (Θ1) = 0.69 (where the prior log10odds of ca. 0.83), the 150 °C sample dataset only needs 0.074 bits of information more than the controlled to know the true value of the ground truth of a comparison. For the sample pairwise comparisons, the best datasets that are the best calibrated and have high discriminating power are the TIS and the BPC/TIS datasets, followed by the 150 °C dataset. The worst analytical methodology was the BPC system, which had the highest controlled apex and the largest difference between the observed and controlled lines. This reiterates what was observed by looking at the LR values alone (Tables 3 and 4) where the lowest LR values were observed with the BPC dataset. The ECE plots for the remaining datasets are presented in the appendix (Fig. S.5).
Strength of Accurate Comparison Determination Based on PCC Values
The proposed approach to comparing an unknown and known lubricant sample via DART-MS analysis is a two-part approach that takes into consideration PCC value and the associated LR to determine if the two samples are from a common source or different source (i.e., sample or class). The first part is to calculate the PCC value and compare that with the optimal threshold for that ionization method used (i.e., 150 °C, 350 °C, and BPC) and for a particular type of comparison (i.e., sample or type), to determine if this pairwise comparison is a true positive (i.e., common source) or a true negative (i.e., different source). The second part is the probability that the determination is a true positive or a false positive based on the likelihood ratio calculated using a lubricant database. This is based on using the PLR and NLR associated with the PCC value of the comparison. It is not feasible to determine if a sample is either a true positive or a true negative based on the PCC value alone.
The value calculated for each pairwise comparison can be compared to either the sample dataset or the class dataset for the appropriate ionization method selected for analysis. Thus, providing the analysts with a manner of determining how likely a sample could be associated with an individual sample and class, which could increase the strength of the final determination, common source or different source.
If using the verbal scale for LRs, as presented by Martire et al., in the forensic comparison of lubricants, the PLRs for PCC values greater than the optimal PCC threshold offered weak to strong support that the comparisons would be a true positive. Conversely, the NLRs for PCC values less than the optimal threshold offered weak to strong support that these comparisons would be a true negative. The closer the PCC value is to the optimal threshold, the more difficult it will be to determine one lubricant from another, because there is a trade-off between true positives and negatives due to the overlap between these two Gaussian curves. Analyzing flavored/fragrances lubricants will yield strong support for the final comparison determination using the 150 °C ionization temperature as well as the obtaining the TIS from the ionRocket. These datasets provide the highest discriminating capability and well-calibrated LRs. It is evident that making a common/different source determination using BPC yields the worse results and should not be used in comparing samples.
The resulting mass spectral profiles showed additive components with lower boiling temperatures at higher abundances (even base peak levels) and yielded better discrimination of the fragrance/flavored lubricants. This study was not intended to be a comparison between different analytical methodologies that could be used to analyze sexual lubricant evidence. Rather, it was intended to illustrate the differences in the spectral profile that can be elucidated by manipulating the desorption temperature. While each ionization procedure discussed herein can be used to analyze and compare unknown and known lubricant samples, several analytical procedures will provide stronger LRs and higher PCC values to support making either a determination that the samples may come from a common source (i.e., sample or class) or from a different source.
The high-temperature analysis conducted at 350 °C was strong for making good class determinations. However, at these temperatures, the ability to discriminate within these classes, specifically for samples that would be differentiated based on volatile additives, can best be achieved at lower ionization temperatures (150 °C) or obtaining the TIS from the ionRocket. While the analysis time is slightly longer (8 min) per sample using the thermal desorption/pyrolysis attachment, TIS generated affords the best individualization of the lubricants in the sample set. This was demonstrated by the fact that KY Ultragel and Jo H2O Water were not separated by either the low or high gas heater DART analysis, as peaks from the propylene glycol component dominated the mass spectra. Only the TIS analysis was able to distinguish these two samples because it was able to identify the sorbitol component present in KY Ultragel.
As sexual lubricants, and water-based lubricants in particular, are complex multicomponent mixtures, depending on the analyte(s), the analyst would like to identify, each protocol has advantages over the others. The low-temperature method provides increased information in regard to volatile additives, such as flavors, fragrances, and solubilizers. The high-temperature method provides the highest ion count and allows rapid determination of the lubricant base. The thermal desorption/pyrolysis methodology seems to provide a great compromise between the traditional DART approaches, albeit with a slightly longer analysis time per sample.
Campbell, G.P., Gordon, A.L.: Analysis of condom lubricants for forensic casework. J. Forensic Sci. 52, 630–642 (2007)
Musah, R.A., Vuong, A.L., Henck, C., Shepard, J.R.E.: Detection of the spermicide nonoxynol-9 via GC-MS. J. Am. Soc. Mass Spectrom. 23, 996–999 (2012)
Smith, W.: PGC/MS for condom lubricant analysis. Anal. Chem. 76, 157A (2004)
Baumgarten, B., Marić, M., Harvey, L., Bridge, C.M.: Preliminary classification scheme of silicone based lubricants using DART-TOFMS. Forensic Chem. 8, 28–39 (2018)
Marić, M., Bridge, C.: Characterizing and classifying water-based lubricants using direct analysis in real time®-time of flight mass spectrometry. Forensic Sci. Int. 266, 73–79 (2016)
Moustafa, Y., Bridge, C.M.: Distinguishing sexual lubricants from personal hygiene products for sexual assault cases. Forensic Chem. 5, 58–71 (2017)
Musah, R.A., Cody, R.B., Dane, A.J., Vuong, A.L., Shepard, J.R.: Direct analysis in real time mass spectrometry for analysis of sexual assault evidence. Rapid Commun. Mass Spectrom. 26, 1039–1046 (2012)
Proni, G., Cohen, P., Huggins, L.A., Nesnas, N.: Comparative analysis of condom lubricants on pre & post-coital vaginal swabs using AccuTOF-DART. Forensic Sci. Int. 280, 87–94 (2017)
Durex Stops Making Condoms With Nonoxynol-9 Due to Possible Increased Risk of HIV Transmission. Kaiser Health News (2004)
Nonoxynol-9 spermicidal lubricant. http://www.aidsmap.com/Nonoxynol-9-spermicidal-lubricant/page/1323016/. Accessed Dec 27, (2018)
Damme, L.V., Ramjee, G., Alary, M., Vuylsteke, B., Chandeying, V., Rees, H., Sirivongrangson, P., Mukenge-Tshibaka, L., Ettiègne-Traoré, V., Uaheowitchai, C., Abdool Karim, S.S., Mâsse, B., Perriëns, J., Laga, M.: Effectiveness of COL-1492, a nonoxynol-9 vaginal gell, on HIV-1 transmission in femal sex workers: a randomised controlled trial. Lancet. 360, 971–977 (2002)
Mirabelli, M.F., Chramow, A., Cabral, E.C., Ifa, D.R.: Analysis of sexual assault evidence by desorption electrospray ionization mass spectrometry. J. Mass Spectrom. 48, 774–778 (2013)
Marić, M., Harvey, L., Tomcsak, M., Solano, A., Bridge, C.: Chemical discrimination of lubricant marketing types using direct analysis in real time time-of-flight mass spectrometry. Rapid Commun. Mass Spectrom. 31, 1014–1022 (2017)
Pubus, D., Sell, C.: The Chemistry of Fragrances, vol. 276. Royal Society of Chemistry, London (1999)
Youden, W.J.: Index for rating diagnostic tests. Cancer. 3, 32–35 (1950)
Kumar, R., Indrayan, A.: Receiver operating characteristic (ROC) curve for medical researchers. Indian Pediatr. 48, 277–287 (2011)
Aitken, C.G.G., Lucy, D.: Evaluation of trace evidence in the form of multivariate data. Appl. Stat. 53, 109–122 (2004)
Koons, R.D., Buscaglia, J.A.: Interpretation of glass composition measurements: the effects of match criteria on discrimination capability. J Forensic Sci. 47, 505 (2002)
Corzo, R., Hoffman, T., Weis, P., Franco-Pedroso, J., Ramos, D., Almirall, J.: The use of LA-ICP-MS databases to calculate likelihood ratios for the forensic analysis of glass evidence. Talanta (2018)
Egli, N.M., Champod, C., Margot, P.: Evidence evaluation in fingerprint comparison and automated fingerprint identification systems--modelling within finger variability. Forensic Sci. Int. 167, 189–195 (2007)
Neumann, C., Champod, C., Puch-Solis, R., Egli, N., Anthonioz, A., Bromage-Griffiths, A.: Computation of likelihood ratios in fingerprint identification for configurations of any number of minutiae. J. Forensic Sci. 52, 54–64 (2007)
Dennis, D.M., Williams, M.R., Sigman, M.E.: Assessing the evidentiary value of smokeless powder comparisons. Forensic Sci. Int. 259, 179–187 (2016)
Pierrini, G., Doyle, S., Champod, C., Taroni, F., Wakelin, D., Lock, C.: Evaluation of preliminary isotopic analysis (13C and 15N) of explosives: a likelihood ratio approach to assess the links between semtex samples. Forensic Sci. Int. 167, 43–48 (2007)
Maric, M., Marano, J., Cody, R.B., Bridge, C.: DART-MS: a new analytical technique for forensic paint analysis. Anal. Chem. (2018)
Pluskal, T., Castillo, S., Villar-Briones, A., Oresic, M.: MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinformatics. 11, 395–405 (2010)
van der Helm, H.J., Hische, E.A.H.: Application of Baye’s theorem to results of quantitative clinical chemical determinations. Clin. Chem. 25, 985–988 (1979)
Zweig, M.H., Campbell, G.: Receiver-operating characteristic plots: a fundamental evaluation tool in clinical medicine. Clin. Chem. 39, 561–577 (1993)
Perkins, N.J., Schisterman, E.F.: The inconsistency of “optimal” cutpoints obtained using two criteria based on the receiver operating characteristic curve. Am. J. Epidemiol. 163, 670–675 (2006)
Martire, K.A., Kemp, R.I., Sayle, M., Newell, B.R.: On the interpretation of likelihood ratios in forensic science evidence: presentation formats and the weak evidence effect. Forensic Sci. Int. 240, 61–68 (2014)
Gross, J.H.: Direct analysis in real time--a critical review on DART-MS. Anal. Bioanal. Chem. 406, 63–80 (2014)
Ramos, D., Franco-Pedroso, J., Lozano-Diez, A., Gonzalez-Rodriguez, J.: Deconstructing cross-entropy for probabilistic binary classifiers. Entropy. 20, 208–228 (2018)
Ramos, D., Gonzalez-Rodriguez, J.: Reliable support: measuring calibration of likelihood ratios. Forensic Sci. Int. 230, 156–169 (2013)
The authors would like to acknowledge Chikako Takei at BioChromato for providing the ionRocket used in this research.
This work was supported by the State of Florida (USA) and by the National Institute of Justice (USA) [Grant No. 2016-DN-BX-0001].
Electronic supplementary material
About this article
Cite this article
Bridge, C., Marić, M. Temperature-Dependent DART-MS Analysis of Sexual Lubricants to Increase Accurate Associations. J. Am. Soc. Mass Spectrom. 30, 1343–1358 (2019). https://doi.org/10.1007/s13361-019-02158-x
- Lubricant analysis
- Likelihood ratios
- Pearson correlation