PFΔScreen — an open-source tool for automated PFAS feature prioritization in non-target HRMS data

Per- and polyfluoroalkyl substances (PFAS) are a huge group of anthropogenic chemicals with unique properties that are used in countless products and applications. Due to the high stability of their C-F bonds, PFAS or their transformation products (TPs) are persistent in the environment, leading to ubiquitous detection in various samples worldwide. Since PFAS are industrial chemicals, the availability of authentic PFAS reference standards is limited, making non-target screening (NTS) approaches based on high-resolution mass spectrometry (HRMS) necessary for a more comprehensive characterization. NTS usually is a time-consuming process, since only a small fraction of the detected chemicals can be identified. Therefore, efficient prioritization of relevant HRMS signals is one of the most crucial steps. We developed PFΔScreen, a Python-based open-source tool with a simple graphical user interface (GUI) to perform efficient feature prioritization using several PFAS-specific techniques such as the highly promising MD/C-m/C approach, Kendrick mass defect analysis, diagnostic fragments (MS2), fragment mass differences (MS2), and suspect screening. Feature detection from vendor-independent MS raw data (mzML, data-dependent acquisition) is performed via pyOpenMS (or custom feature lists) with subsequent calculations for prioritization and identification of PFAS in both HPLC- and GC-HRMS data. The PFΔScreen workflow is presented on four PFAS-contaminated agricultural soil samples from south-western Germany. Over 15 classes of PFAS (more than 80 single compounds with several isomers) could be identified, including four novel classes, potentially TPs of the precursors fluorotelomer mercapto alkyl phosphates (FTMAPs). PFΔScreen can be used within the Python environment and is easily automatically installable and executable on Windows. Its source code is freely available on GitHub (https://github.com/JonZwe/PFAScreen). Graphical abstract Supplementary Information The online version contains supplementary material available at 10.1007/s00216-023-05070-2.


S1 Installation of PFΔScreen
PFΔScreen can be installed and executed within the standard Python environment or by using the Anaconda distribution.To make installation and use as easy as possible, PFΔScreen can be automatically installed with the Installation.batfile and executed with the Run_PFAScreen.batfile.Of course, people familiar with Python can execute the source code with their own custom environment and editor.In the following, the two steps needed for a simple installation are explained.Afterwards, MS 2 spectra displayed by the RawDataVisualization tool (MS 2 extractor), have highlighted fragment mass differences and diagnostic fragments, if some were detected.
After executing the PFASPrioritization tab, the PFΔScreen results table (Excel format and additional CSV file, Fig. S2) and several interactive HTML plots (Fig. S3) are saved in a folder named after the sample that can be easily inspected, including a MD/C-m/C plot, a m/z vs. RT plot (with and without MS 2 raw data), a KMD with linked m/z vs. RT plot (to verify systematic RT-shifts), and a m/C histogram.Data from the results table can be used to visualize EICs (and extrapolate HS with common repeating units such as CF2), MS 1 and MS 2 spectra.Additionally, a coelution correlation can be performed with the RawDataVisualization tool.Also, the theoretical isotope patterns of suspect hits can be displayed over the experimental isotope patterns (MS 1 ) (see Fig. S4).PFΔScreen results table (here as formatted Excel table, a CSV file is also provided).This table summarizes most calculations performed in the PFAS feature prioritization steps and is directly formatted as a table to conveniently sort and slice data.m/z and RT values can easily be copied and for instance EICs or MS spectra (and coelution correlation) can be visualized in the RawDataVisualization tool of PFΔScreen.

S3 Soil sampling
Soils were sampled on four agricultural fields in one diagonal over the respective area.Soils R1 and R2 were sampled near Rastatt, both within the 0 -30 cm horizon.Soil R2 was a sandy loam soil, with pH 5.5 and organic content of 0.8%.Soil R1 was a loamy sand soil, with pH 6.9 and organic content of 2.3%.Soils M1 and M2 were sampled near Mannheim within the 0-30 and 0-50 cm horizon, respectively.Soil M1 was a loam soil, with pH 7.1 and organic content of 6.6%.Soil M2 was a clay loam soil, with pH 7.0 and organic content of 3.9%.All samples were homogenized and mixed thoroughly [1].

1 ) 2 )
Download PFΔScreen: Download the PFΔSScreen source code from https://github.com/JonZwe/PFAScreenby clicking on the green "Code" button and click "Download ZIP".When downloaded, unzip the folder and move it to a local folder on your computer.Automatic installation of Python and the required packages with Installation.bat:Navigate into the folder where PFΔSScreen was copied (PFAScreen-main).Double click the Installation.batfile.Note that depending on your Windows safety settings a warning notification might open that needs to be accepted.The Windows command line interface will open, and the Microsoft Store opens automatically if you do not have Python installed on your computer.Click on "Install" and wait until the installation of Python is finished and close the Microsoft Store.Back to the Windows command line press any button to automatically install pip (Package Manager for Python) and in the following all required Python packages.Finally, when the message "Installation successfully finished" pops up, press any button and the installation is completed.Note that the Python source code (without the automatic installation via batch files) can also executed on other operating systems within the Python environment.Here, the respective packages need to be installed manually.threshold.In case another feature finding procedure (e.g., from vendor software) is desired, custom feature lists (see external_feature_list.xlsx on GitHub) together with the respective mzML files can instead be included in PFΔScreen (without peak finding by OpenMS).This is done by the "Browse SampleFeatures.xlsx"and "Browse BlankFeatures.xlsx" buttons, which are preprocessed by the "Run ExternalFeatureTable" button (Nr. 2 and 4 in Fig S1).Note that data evaluation only works when the corresponding mzML files are also given; otherwise MS 2 data would be missing.Whenever the FeatureFinding tab is completed, the RawDataVisualization (C in Fig S1) can be used even without PFAS-specific data.To perform the PFASPrioritization (B in Fig S1), appropriate input parameters can be set, and then PFAS-specific data evaluation is performed by clicking the "Run PFASPrioritization" button (Nr. 5 in Fig S1).This task is usually computed in less than one minute, allowing a convenient adjustment of input.

Fig. S 2 :
Fig. S 2:PFΔScreen results table (here as formatted Excel table, a CSV file is also provided).This table summarizes most calculations performed in the PFAS feature prioritization steps and is directly formatted as a table to conveniently sort and slice data.m/z and RT values can easily be copied and for instance EICs or MS spectra (and coelution correlation) can be visualized in the RawDataVisualization tool of PFΔScreen.

Fig. S 3 :
Fig. S 3: PFΔScreen interactive HTML plots.(a) m/z vs. RT of all features and the precursors with MS2 raw spectra.Blue color corresponds to a detected feature, yellow if an MS 2 spectrum was assigned and green displays all MS 2 spectra.This plot can be used to find suitable m/z and RT tolerances for MS 2 alignment depending on the chromatography (e.g., peak width) and the MS 2 scan rate.(b) m/z vs. RT overview with m/C as colormap.(c) MD/C-m/C plot to deduce reasonable cutoffs for data reduction depending on the sample matrix.(d) m/C histogram to visualize the m/C distribution of the measured sample.(e) KMD plot coupled to (f) m/z vs. RT to easily verify the systematic RT-shift of each detected homologous series (see Fig. S5).

Fig. S 4 :
Fig. S 4: PFΔScreen interactive figure from the RawDataVisualization tool.(a) EICs can be generated with comma separated lists of m/z values of for one m/z value n homologue of a common repeating unit (e.g., CF2) are automatically generated.(b) Extracted MS 1 spectrum at a particular RT of interest.(c) When a chemical formula of a suspected compound (e.g., a suspect hit for PFOA, C8F15O2 for [M-H] -) is given, the theoretical isotope pattern is overlayed with the normalized cutout at this specific m/z.(d) MS 2 spectra of an m/z of interest can also be visualized with annotations and fragment mass differences.

Fig. S 5 :
Fig. S 5: Interactive KMD tooltips to visualize RT-shifts with increasing m/z for each detected HS.(a) Systematic (fits to PFCAs) and (b) non-systematic RT-shift (potential false-positive unknown group of compounds).

Fig. S 6 :
Fig. S 6:Example of an MS 2 spectrum where unknown chemical formulas (here only C8F17) of fragments are calculated by propagation of chemical formulas from diagnostic fragments via fragment mass differences.

Fig. S 9 :
Fig. S 9: Results from the EIC correlator from the RawDataVisualization tools of PFΔScreen for the in-source fragment m/z = 966.9944(that corresponds to 6:2/8:2 FTMA diol sulfoxide sulfone) at a RT-width of 20 s and a R2 correlation threshold of > 0.95.

Fig. S 10 :
Fig. S 10: Cutout from an O-and CF2-based KMD vs. m/z plot from the soil extract of M1 showing the different sulfur oxidation states from one to four oxygen atoms.