Critical evaluation of the role of external calibration strategies for IM-MS

The major benefits of integrating ion mobility (IM) into LC–MS methods for small molecules are the additional separation dimension and especially the use of IM-derived collision cross sections (CCS) as an additional ion-specific identification parameter. Several large CCS databases are now available, but outliers in experimental interplatform IM-MS comparisons are identified as a critical issue for routine use of CCS databases for identity confirmation. We postulate that different routine external calibration strategies applied for traveling wave (TWIM-MS) in comparison to drift tube (DTIM-MS) and trapped ion mobility (TIM-MS) instruments is a critical factor affecting interplatform comparability. In this study, different external calibration approaches for IM-MS were experimentally evaluated for 87 steroids, for which TWCCSN2, DTCCSN2 and TIMCCSN2 are available. New reference CCSN2 values for commercially available and class-specific calibrant sets were established using DTIM-MS and the benefit of using consolidated reference values on comparability of CCSN2 values assessed. Furthermore, use of a new internal correction strategy based on stable isotope labelled (SIL) internal standards was shown to have potential for reducing systematic error in routine methods. After reducing bias for CCSN2 between different platforms using new reference values (95% of TWCCSN2 values fell within 1.29% of DTCCSN2 and 1.12% of TIMCCSN2 values, respectively), remaining outliers could be confidently classified and further studied using DFT calculations and CCSN2 predictions. Despite large uncertainties for in silico CCSN2 predictions, discrepancies in observed CCSN2 values across different IM-MS platforms as well as non-uniform arrival time distributions could be partly rationalized. Supplementary Information The online version contains supplementary material available at 10.1007/s00216-022-04263-5.


Introduction
High-resolution mass spectrometry (MS) coupled to liquid chromatography (LC) has evolved as key technology for the analysis of small molecules in metabolomics, lipidomics, environmental analytics and related disciplines [1][2][3]. Due to the chemical diversity of small molecules and the large variety of possible isomers and isobars present, increasing method selectivity by enhancing peak capacity remains of great interest [4,5]. In this regard, ion mobility coupled to mass spectrometry (IM-MS) increases peak capacity and improves signal-to-noise ratios for such applications [2,3,6,7]. Importantly, due to the speed of gas-phase separations of ions, IM is readily integrated in LC-MS workflows without compromising total analysis time [8,9]. Different types of IM-MS analysers are now commercially available including drift tube (DTIM-MS), travelling wave (TWIM-MS) or trapped ion mobility coupled to MS detectors (TIM-MS) [8,10]. In IM-MS, analyte ions are separated based on opposing forces of an applied electric field and collisions with a neutral buffer gas (typically nitrogen) before entering the MS analyser [8,11]. As a derived property, the collision cross section (CCS) of an ion can be calculated with excellent interlaboratory precisions of typically in the range of 1-2% reported in several studies [12][13][14]. Moreover, the increasing number of curated and freely available CCS databases [15][16][17] has popularized the use of CCS values as an identification parameter intended for standard-free identification workflows [17][18][19].
However, in contrast to mass to charge ratios (m/z), CCS is a conditional value of an ion that cannot be calculated in a straightforward manner. Moreover, experimentally observed ion structures may be influenced by the employed experimental parameters including the ESI source conditions, as well as applied voltages and source temperatures [11]. Of the major commercial instrument types, DTIM-MS is most closely related to fundamental IM theory, and DT CCS N2 values can be derived using a primary method of measurement (i.e. stepped-field method) or via secondary methods (i.e. single-field calibrated) on a routine basis [12]. However, uncertainties associated with reference values remain due to lack of standardization and reference materials. These uncertainties therefore directly influence the different secondary calibration approaches that are applied on a routine basis for CCS determination using DTIM-MS, TWIM-MS and TIM-MS [20]. Especially for TWIM-MS, the applied calibration strategy including the selection of calibrant ions has been reported to influence comparability of TW CCS N2 values [21]. Fundamental differences between TW CCS N2 and DT CCS N2 values due to ion transport and ion heating effects have also been discussed as potential sources of differences observed between IM-MS platforms [22].
In context of small molecules, steroid analysis is of special interest due to the large number of possible isomers, and benefits of IM-MS for steroid analysis were demonstrated previously [7,13,15,23,24]. From the analytical applications perspective, the comparability of CCS N2 for steroid analysis using three different IM-MS technologies was recently investigated, and interlaboratory bias of < 2% for the majority of the investigated ions was demonstrated [14]. Nevertheless, large deviations (up to 7%) of CCS values derived from TWIM-MS and TIM-MS to DT CCS N2 values have also been reported [14,25]. In addition to the possibility of fundamental differences in ion conformations generated and sampled by the different IM-MS instruments, systematic bias of TW CCS N2 values compared to other IM-MS instruments is evident and may have its origin in the applied external calibration [14,25]. Alongside analytically challenging examples such as ions with complex arrival time distributions and the high level of effort required for computational prediction of CCS values using density functional theory (DFT) for large datasets [26], this issue leaves the use of CCS N2 values as an IM-MS technology-independent identification parameter in a currently unsatisfactory position [14]. Therefore, to further investigate the effect of the applied calibration approach and especially the role of reference values used for TW CCS N2 calibration, alternative external calibration and internal correction approaches are explored in the present work. In addition to matching calibrant class to sample type, unified calibrant sets and stable isotope label (SIL) internal correction strategies are investigated. With a goal of elucidating the magnitude of calibration-dependent bias, this work aims to support efforts toward long-term applicability of CCS N2 values for analytical small molecule applications.
Ultrapure water from a Milli-Q IQ 7000 purification system and LC-Pak® polisher cartridge (Merck Chemicals and Life Science GmbH, Vienna) along with LC-MS grade acetonitrile (ACN) and formic acid (FA) from Sigma-Aldrich were used to prepare eluents and to dilute standards prior to LC-DTIM-MS analysis. ESI-L Tune Mix (G1969-85,000, Agilent Technologies) along with 0.1 mmol/L HP-0321 (Agilent Biopolymer Reference Kit) were used for mass calibration of the Agilent 6560 DTIM-QTOF and for determination of DT CCS N2 values using the single-field calibration method [12] and was tested for TW CCS N2 calibration on the Waters Synapt G2-S.
Sodium formate (0.5 mmol/L in 90:10 (v/v) 2-propanol:water was prepared from sodium hydroxide (1 mol/L, Fisher Chemical™) and formic acid (Promo-chem®) supplied by Fisher Scientific (Loughborough, UK) and was used for mass calibration of the Synapt G2-S. Major Mix IMS/ToF Calibration Kit (Waters, Wilmslow, UK) was used for TW CCS N2 calibration and is referred to as "CCS Major Mix" in the following sections.
An Agilent 6560 IM-QTOFMS equipped with a Dual Jetstream ESI source was used for determination of new DT CCS N2 reference values for Waters CCS Major Mix and stable isotope labelled (SIL) steroids (see Electronic Supplementary Information Tables S1-S3).

Sample preparation
Agilent ESI-L tune mix was prepared according to manufacturer instructions for the ion source used in this study. Briefly, a 1:10 dilution of ESI-L Tune Mix was prepared in 95:5 (v/v) water:ACN and additionally spiked with HP-0321 (hexamethoxyphosphazine). A set of 87 steroids used in previous interlaboratory comparisons of different IM-MS systems was also used in this study [13,14]. Mixtures of standards were prepared at 0.5 µg/mL for LC-TWIM-MS analysis; water-soluble steroids were prepared in 95:5 (v/v) 0.1% FA:ACN, while hydrophobic steroids (e.g. sterol esters) were prepared in 50:50 (v/v) 0.1% FA:ACN according to an established protocol [14]. For investigating the possibility of SIL-supported internal correction for TW CCS N2 calibration, standard mixtures were spiked with SIL-steroid standards to yield a final concentration of 0.5 mg/L.

Instrumentation and data acquisition
Previously established RPLC and DTIM-MS methods were used to analyse stable isotope labelled steroids and CCS Major Mix calibrant ions. For this purpose, an Agilent 6560 IM-QTOFMS equipped with a Dual Jetstream ESI source was used. For DTIM-MS analysis, mixtures were directly infused using a syringe pump (KD Scientific, series 100, USA) at a flow rate of 20 µL/min. Applied method parameters have been previously reported and are summarized in the Electronic Supplementary Information [14].
The same LC method was used for front-end separation along with TWIM-MS measurements using a Waters Synapt G2-S TWIM-MS system. An Acquity UPLC System (Waters) equipped with an Acquity UPLC®(BEH C18, 2.1 mm × 100 mm, 1.7 mm; Waters) was used along with previously established methods [15]. Prior to analysis, sodium formate (0.5 mmol/L in 90:10 (v/v) 2-propanol:water was used for mass calibration, while lock mass correction was applied during measurements using leucine enkephalin (1-2 ng/mL in 50:50 (v/v) 0.2% FA:ACN). The instrument was CCS calibrated using (1) CCS Major Mix or using alternative calibration approaches with (2) Agilent ESI-L tune mix or (3) a mix of steroids with newly established CCS N2,ref values (see Electronic Supplementary Information Table S3). The two commercial solutions were prepared according to vendor instructions, while the mixture of steroid standards was prepared at a concentration of 10 mg/L in 50:50 0.1% (v/v) FA:ACN. The TWIM-MS acquisition methods in ESI + and ESI − modes were optimized according to the applied CCS calibration mixture (see Electronic Supplementary Information). Finally, TIM CCS N2 data from a recent study was also used for comparisons of new IM-MS calibration and correction strategies [14].

Data processing and visualization
Single-field calibration for DTIM-MS was applied using Agilent IM-MS Browser 10.0. Single-field calibrated data was demultiplexed and pre-processed using PNNL Preprocessor 3.0 (2021.04.21) [27], and Agilent MassHunter Mass Profiler 10.0 was used for peak picking and alignment of triplicate measurements [14].
For TWIM-MS, DriftScope V.2.8 included in Mass-Lynx 4.2 software (Waters) was used to determine the TW CCS N2 calibration functions, which were saved into corresponding measurement data files. Individual data files were investigated using DriftScope and MS-DIAL 4.60 [28,29] was used to batch-process TWIM-MS data. To this end, datafiles in raw format were converted to.ibf files using the built-in converter. Settings used for peak picking and alignment are provided in the Electronic Supplementary information. TW CCS N2 values were calculated from arrival times using the Enhanced Duty Cycle (EDC) coefficient to correct arrival times [30], and a detailed description of the applied calibration approach is presented in the Electronic Supplementary Information.

Stable isotope label (SIL)-based internal correction
For establishing an application-specific internal correction strategy, linear models to describe the relationship between the CCS-ratio and modified CCS (CCS' = DT CCS N2,ref •√(µ)⁄z using reduced mass µ and the charge number z) were established based on a set of twelve SIL steroids, for which new DT CCS N2,ref values were determined (see Table S4). The mixture of SIL steroids was added to all samples to yield a final concentration of 0.5 mg/L in each vial.

Comparison of datasets
Bias between new experimental data and literature values were calculated as follows: A summary of new experimental data recorded and datasets from literature used for comparison is provided in Table 1.

Computational methods
Gaussian 16 software was used for DFT calculations. Ion structures were fully optimized by density functional theory (DFT) with B3LYP and wB97xD functionals. The basis set 6-311 + + G(d,p) including both diffuse and polarization functions was used for the calculations. Frequency calculations were performed at the same level of theory at 298.15 K to find optimized structures for local minima. Charge distribution was calculated using the Merz-Kollman (MK) method. The Gaussian output files containing geometrical parameters of the candidate structures and MK charges were used to build input files for CCS N2 calculations. CCS N2 calculations were performed using MOBCAL-MPI software using the trajectory method (TM) [33,34]. CCS N2 values were predicted for 298 K in 10 cycles. Velocity integration was set to 48 and impact integration was set to 512 in the graphical user interface.  (Fig. 1). Additionally, the intercepts and coefficients of the obtained linear models resemble the linear fits for the steroid data in our previous study in a direct comparison of DT CCS N2 with the reference values (see Fig. 1, Table S1 and Table S2) [14]. It was also noteworthy that some of the investigated calibrant ions exhibited non-uniform arrival time distributions on the DTIM-MS system, which may influence their reliability of CCS N2 calibration particularly for high-resolution IM-MS (see Figure S1). Given that the resolution of most current IM-MS instrumentation does not permit full resolution of these apparent conformers, software-based peak picking results from DTIM-MS data with native resolution (50-60) were used for further work. The bias of derived TW CCS N2 values from each approach was compared to previously published DT CCS N2 (Fig. 2a) datapoints in the relevant CCS' range with Agilent ESI-L tune mix is used is low (see Figure S2). The effect of applying a new calibration on the observed CCS' dependency of reported bias in a large dataset was further investigated using Pearson correlation (Fig. 3).

Evaluation of alternative external calibration strategies for TWIM-MS
While a moderate positive correlation of the bias between DT CCS N2 values and the interlaboratory dataset with respect to CCS' is apparent (Pearson r = 0.535), this correlation could be diminished after calibration with new DTIM-MS reference values for CCS Major Mix (Pearson Fig. 2 Bias data according to applied external calibration strategies compared to a DT CCS N2 and b TIM CCS N2 values with respect to published single laboratory data (SL) [15] and interlaboratory data (IL) [13] that employed the vendor-recommended procedure for TW CCS N2 calibration. Shown alongside are new experimental TWIM-MS data calibrated using the Agilent ESI-L tune mix approach (A), newly determined DT CCS N2 reference for CCS Major mix (B); and class-specific external calibrant ions (ST)  [14] using the calibration approach with new reference values for CCS Major Mix (see Table S12). Previously, a bias of 2.3% between interlaboratory TW CCS N2 and DT CCS N2 was observed, but this was reduced to 0.5% using the new reference values for TWIM-MS calibration indicating that the same ion conformation appears to be sampled on both DTIM-MS and TWIM-MS.

Evaluation of internal correction using stable isotope labelled standards
As the establishment of new reference values for IM-MS calibrants for routine CCS determination is not a trivial task, further analytical strategies to reduce bias between datasets were considered. One such candidate method is the employment of correction functions based on a set of internal standards spiked into all samples to be used for multiple correction functions or for internal calibrations [35,36]. While the use of natural internal standards limits the number of compounds that can be spiked, SIL-based internal standardization are ideal candidates for application to LC-IM-MS methods by exploiting the alignment of isotopologues in both LC and IM dimensions [18,37,38]. Due to their frequent use as internal standards for quantitative purposes, the potential of increasing the scope of this approach to include an internal correction for CCS N2 determination was also considered here for the first time for steroid analysis. To this end, DT CCS N2 values were established for SIL-compounds and used to monitor bias of externally calibrated TW CCS N2 values (Table S4). The ratios of measured TW CCS N2 values and DT CCS N2 were then used to monitor systematic bias trends as a function of CCS' and allow derivation of correction factors from linear models to be used as empirical correction factors applied to experimental TW CCS N2 values ( Figure S3); and results are presented in detail in Figure S4 and Table S13. One major observation is that due to the broad bias distributions encountered in all externally calibrated datasets (i.e. standard deviations between 0.5% and 1.1% were observed), a sufficiently large number of SIL internal standards appears to be necessary for internal correction strategies to achieve appropriate correction of calibration-dependent systematic bias. This is a practical challenge for many applications as SIL internal standards are typically expensive, and availability is limited or non-existent for some molecular classes. This was found to be true in the case of negative mode for this application where only two suitable SIL-steroid compounds (one sterolsulphate and one sterol-glucuronide) could be employed in this study. Therefore, SIL-based internal correction was only applied to protonated and sodiated ions in ESI + data ( Table 3). This correction strategy was applied to datasets that were externally calibrated using the native steroid mix (ST-SIL) and using newly determined DT CCS N2 values as reference for the routinely used CCS Major Mix (B2-SIL). Prior to application of the correction, a systematic positive bias for the ST dataset was observed (0.61% ± 0.69% compared to DT C-CS N2 data), while the systematic bias was negligible for the B2 dataset (− 0.08% ± 0.59% compared to DT CCS N2 data). Application of SIL-based correction was found to reduce the average absolute bias of ST dataset with respect to both corresponding DT CCS N2 and TIM CCS N2 data ( Figure S4). The significance of this improvement was tested using a nonparametric Wilcoxon test, which revealed that the improvement from the SIL-based correction of the ST data (ST-SIL) was significant (p < 0.05), whereas the corresponding change of the bias distribution in B2 data (B2-SIL) was not significant (p > 0.05, see Figure S4). The effect of internal correction was found to be negligible and good agreement with DT CCS N2 and TIM CCS N2 with 95th percentiles in the range of 1.0-1.3% was maintained (see Table 3). The difference between the B2-SIL data and ST-SIL data was also investigated and was found to be significant (see Figure S5). Taken together, these results demonstrate that, while the internal correction method based on SIL analogues can reduce systematic bias in such datasets and has potential for method-specific application across different IM-MS platforms, standardization of external calibration strategies remains the most critical issue for CCS determination using TWIM-MS. Rose et al. [35] noted similar observations during optimization of the calibration procedure of a high-resolution SLIM-MS device for the analysis of lipids.
The results also highlight challenges faced in calibration of TWIM-based technology and the need for optimization of external calibration approaches for the calibration of new IM-MS technologies used for small molecule analysis.

Understanding outliers using in silico calculations
While the application of alternative calibration and internal correction approaches was shown in this work to minimize average and systematic bias between TW CCS N2 data and reference CCS N2 data from both DTIM-MS and TIM-MS instruments, a small subset of outliers within the individual datasets remained. These now almost unambiguous outlier values may represent true conformational differences of the corresponding ions sampled on the different instruments or may be result of more complex behaviour such as dissociation or intermediate complex formation. To investigate these outliers in detail, DFT calculations were used to determine the structures of possible conformers, protomers, and deprotomers of some of these outliers, followed by CCS N2 prediction for the candidate geometries using MOBCAL-MPI software [34]. The two methods used for structural optimization (B3LYP and ωB97xD functionals) can yield differences in optimized geometries, particularly for ions with flexible structures and considering the inclusion of atom-atom dispersion corrections in ωB97xD [39]. Thus, to benchmark performance of the employed workflow, CCS N2 values of several common reference ions used routinely for IM-MS calibration were calculated for the structures optimized by these two functionals. As each ion can have several conformers or (de)protomers, structural optimization, charge distribution analysis, and CCS N2 calculations were carried out for all these isomers. The ωB97xD-optimized structures of several common tune ions and the relative stabilities of conformers and (de)protomers were compared using the calculated Gibbs free energies (Figures S6-S10) and predicted CCS N2 values for the most stable candidate geometries are compared with the experimentally determined DT CCS N2 in Tables S5-S9. Overall, the ωB97xD-predicted CCS N2 were found to be in better agreement with the experimental values, but uncertainty with such predictions remains large. All candidate geometries and corresponding CCS N2 values for the [M + H] + ions of acetaminophen and verapamil could nevertheless be tentatively correlated to the experimental DTIM-MS spectra presenting non-uniform distributions for these ions ( Figure S1).
Both the protonated and sodiated adducts of boldenone undecylenate (BU) showed unexpected IM behaviours in DT, TW, and TIM. As only the protonated ion was experimentally observed with good abundance with all instrument platforms, this was the focus for additional computational predictions. The optimized candidate geometries and relative energies of [BU + H] + are shown in Fig. 4. The small difference between the Gibbs free energies of the candidate geometries indicates multiple possible candidate structures for this ion. In the absence of any external collision or energy, the Boltzmann distribution at 298 K for the conformers a, b, c, d and e is as 93.2%, 6.0%, 0.2%, 0.5% and 0.1%, respectively. While these predictions allow rationalization of experimental results, the observed distribution of the conformers is expected to depend on the ion source geometry and conditions (i.e. temperatures, voltages) experienced by the ion on the respective IM-MS platforms even when all other analytical method parameters (i.e. LC flow rate, solvent composition) are kept consistent. This is a major challenge for development of interplatform CCS databases covering compounds with a high degree of flexibility leading to complex arrival time distributions where multiple CCS values cannot routinely be compared due to both the differences in ion source conditions and resolving power of different IM-MS platforms.
In the negative mode, the measured CCS N2 values for estradiol diglucuronide, [ED-H] − , in DTIM-MS, TWIM-MS and TIM-MS are 238.2, 254.3 and 253.8 Å 2 , respectively [14]. Due to the structure of this compound with its two glucuronic acid groups, the occurrence of two distinct deprotomers with different CCS is a possible explanation for the differences observed between the different instruments. DFT calculations showed that while both deprotomers (see [ED-H] − -a and [ED-H] − -b in Figure S11) can be formed in ESI − , the predicted CCS N2 for these deprotomers differed by only 1.3% (see Table S10) indicating that differences observed between the experimental CCS N2 from different IM-MS platforms for [ED-H] − are too large to be interpreted as distinct deprotomers being observed on different types of IM-MS instruments. Therefore, further conformations of [ED-H] − were also considered. The calculated CCS N2 values for the conformers of this ion (Table S10, (Table S11) and experimental CCS N2 of [ED-2H + Na] − was found for the open conformations, which suggests that a compact conformation of this ion is experimentally observed, e.g. a CCS N2 of 252 Å 2 was predicted for conformer f. The greater stability due to lower Gibbs energy (~ 70-100 kJ/mol) of the closed conformation is thus a plausible explanation for the stability of this conformer across different IM-MS platforms. Therefore, one plausible explanation for the [ED-H]-results is that, despite being energetically less favourable, the relatively small energetic difference (~ 15 kJ/mol) may allow formation of the closed Finally, as exhaustive review of the DTIM-MS data acquired with different measurement conditions could not provide conformation of this first hypothesis, a second feasible origin of the discrepancies between IM-MS platforms is ion transport effects occurring on the TWIM-MS and TIM-MS platforms which were also considered. A plausible mechanism would involve formation and transport of multimeric species that are dissociated in a post-IM region and then detected as [ED-H] − . Some experimental evidence supporting this hypothesis from DTIM-MS data is presented Fig. 4 The Gibbs free energy diagram and optimized structures for five of the most stable conformers of [BU + H] + . The relative Gibbs energies (numbers in parenthesis) and calculated CCS values are in kJ mol −1 and Å 2 , respectively. The insert shows arrival time spectra determined using DTIM-MS using 4-bit multiplexing (solid line) and high-resolution demultiplexing (dashed line) [14] in Figure S13. Overall, these results highlight the difficulty of unambiguously correlating CCS predictions with experimental values due not only to previously reported issues with predictions using nitrogen as drift gas [40][41][42], but also the fundamental challenge of correctly optimizing candidate geometries and the additional potential for non-ideal behaviour such as clustering that can lead to major discrepancies between CCS values reported on different IM-MS platforms.

Conclusion
Investigation of calibration-dependent bias and testing of alternative calibration sets for TWIM-MS in this work highlights the critical importance of external calibration for CCS N2 determination using IM-MS. While good agreement between different types of IM-MS was shown in previous research for steroid analysis (i.e. bias < 2% for most investigated compounds), use of new CCS N2,ref values for routinely used TWIM-MS calibrant ions is shown to be best-suited for amelioration of the CCS'-dependent trends observed with respect to both DTIM-MS and TIM-MS. This improvement is also of fundamental importance for differentiating ions with true structural differences observed on different instruments from outliers that are in fact resultant from calibrantdependent effects.
Other analytical strategies investigated in this work showed limitations for the application investigated. A unified calibration with Agilent ESI-L tune mix and a classspecific calibration mixture (steroids) could not improve the average bias between TW CCS N2 and reference CCS N2 values. However, a new approach using stable isotope labelled (SIL) internal standards to internally correct TW CCS N2 data using ratios of DT CCS N2 values and measured TW CCS N2 values of internal standards significantly improved agreement between datasets from different IM-MS platforms. Although this SILbased approach can be cost-prohibitive and cannot replace proper external TW CCS N2 calibration, it may be a candidate method that can be applied across IM-MS platforms for specific applications.
DFT calculations in combination with CCS N2 prediction could provide rational explanations for some experimental observations according to alternative ion conformations, flexibility of side chains or the formation of multimeric ion clusters for some ions. Although a detailed mechanistic understanding of observed differences is not always possible due to the relatively high uncertainties associated with such in silico predicted CCS values, such methods are valuable to test hypotheses for individual examples in small molecule IM-MS datasets.