Introduction

The origin of life is a yet unsolved problem that continues to fascinate generations of scientists [1]. Unravelling the routes that led the transition from an abiotic scenario composed of complex chemical species to simple living systems is perhaps one of the biggest scientific questions of our time. This transition requires synthesis of the building blocks of life (amino acids, sugars, nucleotides), the conditions leading to polycondensation of these blocks into far-from-equilibrium proto-biopolymers (proto-nucleic acids, proto-peptides, proto-oligosaccharides), the emergence of biochemical function, and the later organization and compartmentalization into supramolecular systems resembling proto-cells. Analytical chemistry thus plays a central role by providing new tools to probe the extremely complex, seemingly intractable mixtures involved in these prebiotic reactions [2].

Prebiotic chemistry experiments investigating these questions typically produce a large and complex inventory of chemical precursors rather than a single product of biological relevance, and only a select few that react further to form larger-order structures. As is common practice in synthetic organic chemistry, preliminary screening of complex prebiotic reaction mixtures is often performed via thin layer chromatography (TLC) because of its simplicity and speed [3]. Visualization of developed TLC plates most commonly uses dyes or fluorescence to identify spots corresponding to the separated components [4].

Standard TLC analysis is rapid, but lacks chemical specificity. For more comprehensive TLC analysis, mass spectrometry (MS) can be employed to read the plates through extraction or direct probing of their surface [58]. Direct MS can be performed statically on select spots, or dynamically in microprobe MS imaging mode (MSI), the latter eliminating the need for knowing a priori where the separated compounds are located. Microprobe MSI involves spatially resolved desorption/ionization of analytes from the surface, followed by mass analysis, occasionally in tandem fashion (MS/MS). Experiments are performed by rastering the ion source probe in unidirectional line scans across the sample, recording mass spectra, and translating those into a chemical image [911]. MSI of 1D and 2D high performance (HP) TLC plates has been successfully demonstrated with a number of MS techniques, including matrix assisted laser desorption ionization (MALDI) [12], desorption electrospray ionization (DESI) [9, 1315], and liquid microjunction-surface sampling probes [16], among others [17, 18].

Targeted interpretation of microprobe MSI data from HPTLC plates is straightforward if m/z values of analytes of interest are known: images for these m/z values are simply extracted from the (x,y,m/z) data cube. However, untargeted detection of unknown species can detract from the high-throughput appeal of TLC, as MSI datacubes typically contain thousands of chemical images. Mining useful information from untargeted MSI HPTLC datasets can easily take hours or even days, depending on mixture complexity. An automated workflow to detect spots in MS images from HPTLC plates, even if their m/z values are a priori unknown, would significantly increase the usability of HPTLC-MSI approaches.

Algorithms commonly used for MSI data mining have included principal component analysis (PCA) [19, 20], non-negative matrix factorization (NMF) [21], and clustering [22]. Despite their popularity, these approaches are not intended for individually detecting the many components present in a complex mixture, but rather patterns combining the most salient components. Therefore, we developed a new approach for automatically detecting individual spots on TLC MSI images, named DetectTLC. Its core innovation is that m/z images containing elevated ion abundances in spot-like clusters are distinguished from noninformative images on the basis of differences in their quantitative features, such as compactness, eccentricity, extent, solidity, and entropy.

We present DetectTLC by applying it to a case study where DESI MSI experiments were performed on proto-nucleic acid building block reaction mixtures separated onto 2D HPTLC plates. These mixtures contained pyrazinone (PZO) or aminopyrazine (APZ) nucleoside analogs that are potentially capable of complementary self-assembly (hydrogen bonding) required for replication [23]. DetectTLC, however, is flexible enough to be applicable to any other HPTLC-MSI experiment.

Experimental Methods

All DetectTLC development was done in MATLAB (MathWorks, Natick, MA, USA). Full details of the algorithms used and assessment of their function are included in the Supporting Information. DetectTLC can be downloaded from the following address: http://www.bio-miblab.org/software/for_testing.zip with username: miblab, password: br6TrUja. Full details of the synthesis and TLC separation of the PZO and APZ mixtures is included in Supporting Information Section 1.1.

Mass spectrometry images were acquired using combinations of two ion source/imaging stages and mass spectrometers to test the ability of DetectTLC to process datasets from instruments with different spatial and mass spectral resolution. The PZO synthesis mixture was imaged using a 2D OmniSpray automated DESI ion source (Prosolia Inc., Indianapolis, IN, USA) coupled to a Synapt G2 HDMS mass spectrometer (Waters, Milford, MA, USA) in positive-ion mode. The m/z range acquired for each pixel was 50–250. A DESI spray of 1% acetic acid (ACS Reagent grade; Sigma-Aldrich, St. Louis, MO, USA) in acetonitrile (OmniSolv LC-MS grade; EMD Millipore Corporation, Billerica, MA, USA) with a flow rate of 5 μL min−1 and a N2 nebulizing gas (150 psi) was used for maximum sensitivity and minimal impact spot size. Stage motion was programmed such that data was acquired in 200 μm × 200 μm pixels for a total image size of 30 mm2. Mass spectral data was converted using Firefly conversion software (Prosolia, Inc., Indiandapolis, IN, USA) into Analyze 7.5 format and images were visualized using omniSpect (http://omnispect.bme.gatech.edu/) [21] and DetectTLC.

The APZ reaction mixture was imaged using an OmniSpray DESI probe (Prosolia Inc., Indianapolis, IN, USA) spraying acetonitrile at a flow rate of 8 μL min−1 and a N2 nebulizing gas pressure of 130 psi. The sample was rastered using a motorized microscope stage (OptiScan II; Prior Scientific Inc., Rockland, MA, USA) controlled by a Labview VI described elsewhere [21, 24], moving at a speed of 160 μm s−1 with a line step of 200 μm. Mass spectra were acquired on an Exactive Plus Orbitrap mass spectrometer (Thermo Scientific, San Jose, CA, USA) in positive-ion mode as centroided data with the following settings remaining constant throughout the experiment; mass resolution (35,000), no. of microscans (2), automatic gain control target (3 × 106), maximum injection time (50 ms), spray voltage (4 kV), capillary temperature (250°C), and S lens rf level (60 V). The imaging data was acquired in a multimodal fashion by alternating each scan between Full MS (no collision-energy applied) and All-Ion Fragmentation (AIF, collision-energy = 60 arb.) modes. There is no precursor ion selection during AIF scans prior to higher-energy collisional dissociation (HCD); therefore, all ions entering the HCD cell were fragmented and detected. In this manner, every other pixel in the MS image represented ions from the alternating modes. The dataset may be visualized as (1) a complete dataset with pixels from both modes included, or (2) as separate Full MS or AIF images after separating the scan modes using Xcalibur software (Thermo Scientific, Inc.). The pixel area corresponding to single scans were 80 × 200 μm for the complete dataset based on stage motion and data acquisition speed. When the dataset is separated into the individual Full MS or AIF datasets, the m/z vs time data is interpolated, resulting in a doubling of effective pixel size (160 × 200 μm). RAW data files were converted to the .mzxml format using ‘msconvert’ for visualization using omniSpect and DetectTLC.

Results and Discussion

The synthesis of pyrazin-2-one (PZO) and 2-aminopyrazine (APZ) monomers is of interest to prebiotic chemistry [23] (Supplementary Scheme SI 1, 2). These were used as model organic reactions to demonstrate the capabilities of DetectTLC, but the approach is widely applicable to any TLC-separated complex mixture. Reaction products were separated using 2D reversed phase HPTLC. The fluorescence image obtained from HPTLC separation of the PZO mixture using solvent system “A” is shown in Figure 1a. Three major PZO products (1–3) that had been previously identified (chemical structures shown in Figure 1b) were observed in the MSI data from MS images generated manually (Figure 1c).

Figure 1
figure 1

Fluorescence image of a developed HPTLC plate with the area imaged by DESI-MSI outlined in green (a). Identified products of the PZO reaction (Supplementary Scheme SI 1, only 5-isomer is shown), with the numbers indicating the spatial location of each product in the fluorescence and MS images (b). Selected ion images acquired by DESI-MSI of reaction products previously identified (c), and the corresponding spot-like images as they were automatically generated by DetectTLC using the preset Protocol 1 processing pipeline (d)

NMF and PCA were applied to the 2D-HPTLC-MSI datasets to assess the ability of these models to identify individual product spots (Supplementary Figures SI 2, 3). For NMF, the data was separated into five components. Three of the component images showed a combination of 1–3 spot-like features, at two unique spatial distributions. The two other components resulted in images and spectra corresponding to chemical background. For PCA, the first five components accounted for 73% of the variation within the dataset. PC 1 and PC 2 showed spot-like features, each with unique spatial distributions. PC 3–5 largely represented chemical noise within the dataset. These results are to be expected considering that both the NMF and PCA models aim to identify predominant patterns within the dataset, not to extract a complete set of spot-like features. The two spot-like features identified by NMF and PCA are the areas with the highest ion intensities, indicating a notable pattern within the dataset. However, additional product spots may have high chemical significance, but are not identified by NMF or PCA, making these methods unsuitable for reliable TLC plate analysis. These results highlight the need for the DetectTLC tool, which can automatically characterize the chemical diversity within a 2D TLC separation,

Fully automated analysis of the MSI data using DetectTLC generated 50+ images containing individual spots that corresponded to various analytes on the TLC plate. Included in those were the known products, as shown in Figure 1d.

Experimental m/z values for these products were consistent with theoretical values, with an average error of 1 mmu. Interestingly, a systematic difference of approximately 5 mmu on average was observed between the raw ion image m/z values and the DetectTLC images, which was attributed to the centroiding process used. Mass spectra were acquired in profile mode, but the m/z peaks observed are not truly Gaussian; therefore, the local m/z maximum value observed differed slightly from the centroided m/z value used by DetectTLC [25]. Differences in image contrast between the manually-obtained and DetectTLC images were also observed and attributed to the median filtering step that removes background noise pixels in the DetectTLC image processing pipeline, discussed below.

One key advantage of DetectTLC is its ability to identify reaction products that may not be anticipated. When the PZO mixture was separated by a second solvent system “B”, a new, intense fluorescence spot, different from 1–3, was detected (Supplementary Figure SI 1). DetectTLC was applied to the MSI dataset generated with the new solvent system, and the results included an ion image at m/z 167.0353 co-localized with the unknown fluorescent spot, yielding a chemical formula of C8H11N2O2 (Δm = 0.8 mmu). Taking into consideration the reaction chemistry, this species was tentatively identified as 3,5(6)-dimethyl-4-oxoethyl-2-oxo-pyrazin-4-ium, a plausible reaction side-product (Supplementary Schemes SI 1, 4).

The m/z 167.0353 image ranked 13th in the DetectTLC results and was displayed on the first page of 24 images in the main graphical user interface (GUI) window. This exemplifies the ability of DetectTLC to detect chemical species in an untargeted, automatic manner, revealing species that may have gone undetected with a more targeted approach.

The DetectTLC workflow is shown in Figure 2. An outline of the operations involved is presented here and additional details are given in the Supporting Information. First, 2D median filtering and pixel count-based filtering are applied to each m/z image in order to remove spatially-disperse, low-intensity noise and to reject sparse or high-intensity noisy images, respectively. The images that pass this step are in the second step thresholded to binary images in order to identify spot-like regions regardless of signal intensity. This facilitates the detection of low-abundance species that are separated by TLC, and leaves the decision of significance to the user. Third, morphologic opening is applied to smooth discontinuous pixel clusters into spot-like regions, thus correcting for signal variability and enhancing the quantitative image feature-based detection method. Fourth, quantitative image features are used to score each m/z image on the basis that it contains a spot. Eight image features are currently available in DetectTLC, seven of which are shape-based (area, compactness, convex area, eccentricity, extent, number of connected regions, and solidity) and the eighth is texture-based (entropy). Each of these steps is automatically performed during routine analysis using two preset protocols, one tailored to identify larger TLC spots, and one for smaller spots, both with no user input required. Finally, m/z images are ranked in terms of their feature values and visualized in the DetectTLC GUI (Figure 3). Through this GUI, the user may save the average mass spectrum across a TLC spot together with the corresponding m/z image for further analysis. Selected ion images and a list of m/z values that produce images with spots may also be batch-exported to a target folder (Supplementary Figure SI 4). Secondary windows, along with a full description of all data processing and visualization possibilities available—including advanced options facilitating manual data filtering and processing by the user—are discussed in the Supporting Information (Supplementary Figures SI 5, 9, Supplementary Table SI 1).

Figure 2
figure 2

An overview of the workflow used in the DetectTLC software tool

Figure 3
figure 3

The main graphical user interface with the top 24 images containing spot-like regions are displayed. Data is first uploaded and (optionally) de-isotoped, following which the users may select from Protocol 1 or Protocol 2 for feature selection, or design their own processing pipeline through the Advanced Options tab. Spots with similar spatial distributions may be identified using the Similarity Options. Selected images and/or spectra may be exported using the Export Options

Various processing pipelines within the DetectTLC framework were assessed for their ability to identify TLC spots. Each possible image processing combination, 128 pipelines in all, were evaluated based on the number of true spots identified in the top 40 images generated when applied to the PZO synthesis dataset. A full assessment of the results for all pipelines is included in Supporting Information (Tables SI 2, 3), along with the top 40 images found by the area, compactness, and entropy features (Supplementary SI Figures 68). It was observed that for thresholding purposes, Otsu’s method and selected manual thresholds yielded comparable results across all other processing variables. The key difference between these two approaches is the user input required for manual thresholding versus the user-independent Otsu’s method. In almost every combination of processing methods, morphologic opening yielded a higher number of spot-containing images than dilation (on average twice as many true spots for all combinations). The other processing step that significantly affected the number of true spots identified was median filtering. For this dataset, the 7 × 7 median filter was found to perform notably better than no filter. The user can input different filter sizes through the Advanced Options menu and select the most appropriate size for the dataset under analysis.

Ultimately, the analysis pipeline of (1) removal of images with <5 and >1500 non-zero pixels, (2) 7 × 7 median filtering, (3) Otsu’s threshold, and (4) morphologic opening, provided the best results for the PZO dataset. Combining this pipeline with entropy as the image feature for ranking, two of the known products in the PZO reaction mixture were included in the top five images output by DetectTLC, and all three predicted products were within the top 40 ranked images. For other chemical systems, DetectTLC settings could be tailored to provide optimum results and to obtain a complete list of m/z images containing spots.

To confidently assign analyte structure by mass spectrometry, MS fragmentation data must be obtained. Modern mass spectrometers allow for data to be acquired in multiple modes sequentially, in real time. For example, alternating between low and high collision-energy scans enables data corresponding to precursor and product ions to be acquired in a single acquisition cycle. In true tandem mass spectrometers, the precursor ion is first selected for activation, but when many analytes are simultaneously ionized as in MSI, it becomes challenging to mass-select all individual precursor ions while maintaining high imaging throughput. Matching of precursor to product ions in alternating scan experiments can, for example, be performed based on 1D chromatographic profile similarity [26].

Similarly, DetectTLC includes algorithms capable of automatically pairing precursors and product ions from multimodal imaging datasets using their 2D spatial distributions on the TLC plate. This kind of similarity matching is performed in DetectTLC by first allowing the user to select the precursor ion spot of interest from a low collision-energy image (Supplementary Figure SI 9). This region of interest is then used to generate a template against which to measure similarity of the high collision-energy images using either Pearson correlation or the hypergeometric similarity measure [27, 28]. The results are displayed as a set of ion images, but the user also has the option of generating an all-fragment mass spectrum displaying peaks for m/z values with similar spatial distribution as the precursor. Further details of the similarity algorithms implemented in DetectTLC are provided under Supporting Information.

To demonstrate this capability, a TLC plate with the separated APZ synthesis reaction mixture (Supplemental Scheme SI 2) was imaged using multi-modal MSI, alternating high and low collision-energy scans. The spot corresponding to the intact protonated precursor ion of 5(6)-ethanol-2-aminopyrazine (m/z 140.0817, Schemes SI 2, 2), a predicted side-product of the APZ reaction, was first tentatively identified by DetectTLC from the low collision-energy dataset, and was selected to create a template of the spot’s location (Figure 4). Pearson correlation and the hypergeometric similarity measure were applied to pair the precursor ion with product ions for structural confirmation (Supplementary Figures SI 10, 11). The top eight m/z images matched using the hypergeometric similarity measure from the high collision-energy dataset are shown in Figure 4c. The same top spot-like m/z images were obtained using the Pearson correlation, but ranked in a different order. Upon further investigation, two of the product ions (m/z 122.0714 and 78.0345) were assigned to H2O and CH6N2O losses from the precursor ion, respectively. This observed fragmentation pattern supports the structural assignment of 5(2-hydroxyethyl)-2-aminopyrazine to the precursor at m/z 140.0817. To validate these DetectTLC matches, liquid chromatography-tandem MS analysis was performed on the m/z 140.0817 precursor ion (see Supporting Information). Two of expected product ions were detected at m/z 122.0727 and 78.0381. Additional products at m/z 83.0627, 95.0665, 105.0503, and 110.0733 were also detected in UPLC-MS/MS experiments (Figure 4d), probably due to sensitivity and ion activation efficiency differences in the instrumentation used.

Figure 4
figure 4

A selected ion image of the precursor ion at m/z 140.1 (a), and the template used to apply the similarity measure (b). The top eight ion images matched with the precursor ion using hypergeometric similarity measure (c). Product ion spectrum obtained by UPLC-MS/MS used to validate the DetectTLC output (d). The fragment ions detected in both UPLC-MS/MS and DetectTLC datasets are indicated with red labels

Conclusions

Through automated analysis of MSI data, DetectTLC is able to pinpoint spots on 2D HPTLC plates corresponding to both expected and unexpected reaction products, including those undetectable via fluorescence, or those that co-migrated, preventing selective optical visualization. This type of processing ensures that the user gains information describing all chemical species present on the plate regardless of their optical or chromatographic properties. Structural identification of reaction products displaying spot-like regions can be accomplished when multimodal imaging is performed, detecting intact and fragment ions in a single acquisition. Although DetectTLC was designed for processing TLC-MSI datasets, it could also be more broadly applied to MSI datasets where other types of spot-like features are expected.