DetectTLC: Automated Reaction Mixture Screening Utilizing Quantitative Mass Spectrometry Image Features
Full characterization of complex reaction mixtures is necessary to understand mechanisms, optimize yields, and elucidate secondary reaction pathways. Molecular-level information for species in such mixtures can be readily obtained by coupling mass spectrometry imaging (MSI) with thin layer chromatography (TLC) separations. User-guided investigation of imaging data for mixture components with known m/z values is generally straightforward; however, spot detection for unknowns is highly tedious, and limits the applicability of MSI in conjunction with TLC. To accelerate imaging data mining, we developed DetectTLC, an approach that automatically identifies m/z values exhibiting TLC spot-like regions in MS molecular images. Furthermore, DetectTLC can also spatially match m/z values for spots acquired during alternating high and low collision-energy scans, pairing product ions with precursors to enhance structural identification. As an example, DetectTLC is applied to the identification and structural confirmation of unknown, yet significant, products of abiotic pyrazinone and aminopyrazine nucleoside analog synthesis.
KeywordsImaging mass spectrometry Data processing Feature detection Ambient MS DESI MS
The origin of life is a yet unsolved problem that continues to fascinate generations of scientists . Unravelling the routes that led the transition from an abiotic scenario composed of complex chemical species to simple living systems is perhaps one of the biggest scientific questions of our time. This transition requires synthesis of the building blocks of life (amino acids, sugars, nucleotides), the conditions leading to polycondensation of these blocks into far-from-equilibrium proto-biopolymers (proto-nucleic acids, proto-peptides, proto-oligosaccharides), the emergence of biochemical function, and the later organization and compartmentalization into supramolecular systems resembling proto-cells. Analytical chemistry thus plays a central role by providing new tools to probe the extremely complex, seemingly intractable mixtures involved in these prebiotic reactions .
Prebiotic chemistry experiments investigating these questions typically produce a large and complex inventory of chemical precursors rather than a single product of biological relevance, and only a select few that react further to form larger-order structures. As is common practice in synthetic organic chemistry, preliminary screening of complex prebiotic reaction mixtures is often performed via thin layer chromatography (TLC) because of its simplicity and speed . Visualization of developed TLC plates most commonly uses dyes or fluorescence to identify spots corresponding to the separated components .
Standard TLC analysis is rapid, but lacks chemical specificity. For more comprehensive TLC analysis, mass spectrometry (MS) can be employed to read the plates through extraction or direct probing of their surface [5, 6, 7, 8]. Direct MS can be performed statically on select spots, or dynamically in microprobe MS imaging mode (MSI), the latter eliminating the need for knowing a priori where the separated compounds are located. Microprobe MSI involves spatially resolved desorption/ionization of analytes from the surface, followed by mass analysis, occasionally in tandem fashion (MS/MS). Experiments are performed by rastering the ion source probe in unidirectional line scans across the sample, recording mass spectra, and translating those into a chemical image [9, 10, 11]. MSI of 1D and 2D high performance (HP) TLC plates has been successfully demonstrated with a number of MS techniques, including matrix assisted laser desorption ionization (MALDI) , desorption electrospray ionization (DESI) [9, 13, 14, 15], and liquid microjunction-surface sampling probes , among others [17, 18].
Targeted interpretation of microprobe MSI data from HPTLC plates is straightforward if m/z values of analytes of interest are known: images for these m/z values are simply extracted from the (x,y,m/z) data cube. However, untargeted detection of unknown species can detract from the high-throughput appeal of TLC, as MSI datacubes typically contain thousands of chemical images. Mining useful information from untargeted MSI HPTLC datasets can easily take hours or even days, depending on mixture complexity. An automated workflow to detect spots in MS images from HPTLC plates, even if their m/z values are a priori unknown, would significantly increase the usability of HPTLC-MSI approaches.
Algorithms commonly used for MSI data mining have included principal component analysis (PCA) [19, 20], non-negative matrix factorization (NMF) , and clustering . Despite their popularity, these approaches are not intended for individually detecting the many components present in a complex mixture, but rather patterns combining the most salient components. Therefore, we developed a new approach for automatically detecting individual spots on TLC MSI images, named DetectTLC. Its core innovation is that m/z images containing elevated ion abundances in spot-like clusters are distinguished from noninformative images on the basis of differences in their quantitative features, such as compactness, eccentricity, extent, solidity, and entropy.
We present DetectTLC by applying it to a case study where DESI MSI experiments were performed on proto-nucleic acid building block reaction mixtures separated onto 2D HPTLC plates. These mixtures contained pyrazinone (PZO) or aminopyrazine (APZ) nucleoside analogs that are potentially capable of complementary self-assembly (hydrogen bonding) required for replication . DetectTLC, however, is flexible enough to be applicable to any other HPTLC-MSI experiment.
All DetectTLC development was done in MATLAB (MathWorks, Natick, MA, USA). Full details of the algorithms used and assessment of their function are included in the Supporting Information. DetectTLC can be downloaded from the following address: http://www.bio-miblab.org/software/for_testing.zip with username: miblab, password: br6TrUja. Full details of the synthesis and TLC separation of the PZO and APZ mixtures is included in Supporting Information Section 1.1.
Mass spectrometry images were acquired using combinations of two ion source/imaging stages and mass spectrometers to test the ability of DetectTLC to process datasets from instruments with different spatial and mass spectral resolution. The PZO synthesis mixture was imaged using a 2D OmniSpray automated DESI ion source (Prosolia Inc., Indianapolis, IN, USA) coupled to a Synapt G2 HDMS mass spectrometer (Waters, Milford, MA, USA) in positive-ion mode. The m/z range acquired for each pixel was 50–250. A DESI spray of 1% acetic acid (ACS Reagent grade; Sigma-Aldrich, St. Louis, MO, USA) in acetonitrile (OmniSolv LC-MS grade; EMD Millipore Corporation, Billerica, MA, USA) with a flow rate of 5 μL min−1 and a N2 nebulizing gas (150 psi) was used for maximum sensitivity and minimal impact spot size. Stage motion was programmed such that data was acquired in 200 μm × 200 μm pixels for a total image size of 30 mm2. Mass spectral data was converted using Firefly conversion software (Prosolia, Inc., Indiandapolis, IN, USA) into Analyze 7.5 format and images were visualized using omniSpect (http://omnispect.bme.gatech.edu/)  and DetectTLC.
The APZ reaction mixture was imaged using an OmniSpray DESI probe (Prosolia Inc., Indianapolis, IN, USA) spraying acetonitrile at a flow rate of 8 μL min−1 and a N2 nebulizing gas pressure of 130 psi. The sample was rastered using a motorized microscope stage (OptiScan II; Prior Scientific Inc., Rockland, MA, USA) controlled by a Labview VI described elsewhere [21, 24], moving at a speed of 160 μm s−1 with a line step of 200 μm. Mass spectra were acquired on an Exactive Plus Orbitrap mass spectrometer (Thermo Scientific, San Jose, CA, USA) in positive-ion mode as centroided data with the following settings remaining constant throughout the experiment; mass resolution (35,000), no. of microscans (2), automatic gain control target (3 × 106), maximum injection time (50 ms), spray voltage (4 kV), capillary temperature (250°C), and S lens rf level (60 V). The imaging data was acquired in a multimodal fashion by alternating each scan between Full MS (no collision-energy applied) and All-Ion Fragmentation (AIF, collision-energy = 60 arb.) modes. There is no precursor ion selection during AIF scans prior to higher-energy collisional dissociation (HCD); therefore, all ions entering the HCD cell were fragmented and detected. In this manner, every other pixel in the MS image represented ions from the alternating modes. The dataset may be visualized as (1) a complete dataset with pixels from both modes included, or (2) as separate Full MS or AIF images after separating the scan modes using Xcalibur software (Thermo Scientific, Inc.). The pixel area corresponding to single scans were 80 × 200 μm for the complete dataset based on stage motion and data acquisition speed. When the dataset is separated into the individual Full MS or AIF datasets, the m/z vs time data is interpolated, resulting in a doubling of effective pixel size (160 × 200 μm). RAW data files were converted to the .mzxml format using ‘msconvert’ for visualization using omniSpect and DetectTLC.
Results and Discussion
NMF and PCA were applied to the 2D-HPTLC-MSI datasets to assess the ability of these models to identify individual product spots (Supplementary Figures SI 2, 3). For NMF, the data was separated into five components. Three of the component images showed a combination of 1–3 spot-like features, at two unique spatial distributions. The two other components resulted in images and spectra corresponding to chemical background. For PCA, the first five components accounted for 73% of the variation within the dataset. PC 1 and PC 2 showed spot-like features, each with unique spatial distributions. PC 3–5 largely represented chemical noise within the dataset. These results are to be expected considering that both the NMF and PCA models aim to identify predominant patterns within the dataset, not to extract a complete set of spot-like features. The two spot-like features identified by NMF and PCA are the areas with the highest ion intensities, indicating a notable pattern within the dataset. However, additional product spots may have high chemical significance, but are not identified by NMF or PCA, making these methods unsuitable for reliable TLC plate analysis. These results highlight the need for the DetectTLC tool, which can automatically characterize the chemical diversity within a 2D TLC separation,
Fully automated analysis of the MSI data using DetectTLC generated 50+ images containing individual spots that corresponded to various analytes on the TLC plate. Included in those were the known products, as shown in Figure 1d.
Experimental m/z values for these products were consistent with theoretical values, with an average error of 1 mmu. Interestingly, a systematic difference of approximately 5 mmu on average was observed between the raw ion image m/z values and the DetectTLC images, which was attributed to the centroiding process used. Mass spectra were acquired in profile mode, but the m/z peaks observed are not truly Gaussian; therefore, the local m/z maximum value observed differed slightly from the centroided m/z value used by DetectTLC . Differences in image contrast between the manually-obtained and DetectTLC images were also observed and attributed to the median filtering step that removes background noise pixels in the DetectTLC image processing pipeline, discussed below.
One key advantage of DetectTLC is its ability to identify reaction products that may not be anticipated. When the PZO mixture was separated by a second solvent system “B”, a new, intense fluorescence spot, different from 1–3, was detected (Supplementary Figure SI 1). DetectTLC was applied to the MSI dataset generated with the new solvent system, and the results included an ion image at m/z 167.0353 co-localized with the unknown fluorescent spot, yielding a chemical formula of C8H11N2O2 (Δm = 0.8 mmu). Taking into consideration the reaction chemistry, this species was tentatively identified as 3,5(6)-dimethyl-4-oxoethyl-2-oxo-pyrazin-4-ium, a plausible reaction side-product (Supplementary Schemes SI 1, 4).
The m/z 167.0353 image ranked 13th in the DetectTLC results and was displayed on the first page of 24 images in the main graphical user interface (GUI) window. This exemplifies the ability of DetectTLC to detect chemical species in an untargeted, automatic manner, revealing species that may have gone undetected with a more targeted approach.
Various processing pipelines within the DetectTLC framework were assessed for their ability to identify TLC spots. Each possible image processing combination, 128 pipelines in all, were evaluated based on the number of true spots identified in the top 40 images generated when applied to the PZO synthesis dataset. A full assessment of the results for all pipelines is included in Supporting Information (Tables SI 2, 3), along with the top 40 images found by the area, compactness, and entropy features (Supplementary SI Figures 6–8). It was observed that for thresholding purposes, Otsu’s method and selected manual thresholds yielded comparable results across all other processing variables. The key difference between these two approaches is the user input required for manual thresholding versus the user-independent Otsu’s method. In almost every combination of processing methods, morphologic opening yielded a higher number of spot-containing images than dilation (on average twice as many true spots for all combinations). The other processing step that significantly affected the number of true spots identified was median filtering. For this dataset, the 7 × 7 median filter was found to perform notably better than no filter. The user can input different filter sizes through the Advanced Options menu and select the most appropriate size for the dataset under analysis.
Ultimately, the analysis pipeline of (1) removal of images with <5 and >1500 non-zero pixels, (2) 7 × 7 median filtering, (3) Otsu’s threshold, and (4) morphologic opening, provided the best results for the PZO dataset. Combining this pipeline with entropy as the image feature for ranking, two of the known products in the PZO reaction mixture were included in the top five images output by DetectTLC, and all three predicted products were within the top 40 ranked images. For other chemical systems, DetectTLC settings could be tailored to provide optimum results and to obtain a complete list of m/z images containing spots.
To confidently assign analyte structure by mass spectrometry, MS fragmentation data must be obtained. Modern mass spectrometers allow for data to be acquired in multiple modes sequentially, in real time. For example, alternating between low and high collision-energy scans enables data corresponding to precursor and product ions to be acquired in a single acquisition cycle. In true tandem mass spectrometers, the precursor ion is first selected for activation, but when many analytes are simultaneously ionized as in MSI, it becomes challenging to mass-select all individual precursor ions while maintaining high imaging throughput. Matching of precursor to product ions in alternating scan experiments can, for example, be performed based on 1D chromatographic profile similarity .
Similarly, DetectTLC includes algorithms capable of automatically pairing precursors and product ions from multimodal imaging datasets using their 2D spatial distributions on the TLC plate. This kind of similarity matching is performed in DetectTLC by first allowing the user to select the precursor ion spot of interest from a low collision-energy image (Supplementary Figure SI 9). This region of interest is then used to generate a template against which to measure similarity of the high collision-energy images using either Pearson correlation or the hypergeometric similarity measure [27, 28]. The results are displayed as a set of ion images, but the user also has the option of generating an all-fragment mass spectrum displaying peaks for m/z values with similar spatial distribution as the precursor. Further details of the similarity algorithms implemented in DetectTLC are provided under Supporting Information.
Through automated analysis of MSI data, DetectTLC is able to pinpoint spots on 2D HPTLC plates corresponding to both expected and unexpected reaction products, including those undetectable via fluorescence, or those that co-migrated, preventing selective optical visualization. This type of processing ensures that the user gains information describing all chemical species present on the plate regardless of their optical or chromatographic properties. Structural identification of reaction products displaying spot-like regions can be accomplished when multimodal imaging is performed, detecting intact and fragment ions in a single acquisition. Although DetectTLC was designed for processing TLC-MSI datasets, it could also be more broadly applied to MSI datasets where other types of spot-like features are expected.
The authors acknowledge support for his research by the Center for Chemical Evolution, jointly supported by NSF and the NASA Astrobiology Program (NSF CHE-1004570). Initial work was supported by a seed grant from the BioImaging Mass Spectrometry Initiative at Georgia Tech to M.D.W. and F.M.F. M.D.W. also acknowledges support from The Parker H. Petit Institute for Bioengineering and Bioscience (IBB), Johnson & Johnson, National Institutes of Health (R01CA108468, U54CA119338), Georgia Cancer Coalition (Distinguished Cancer Scholar Award), and Microsoft Research. C.D.K. acknowledges support from an NSF GRF and a P.E.O. International Scholar Award.
- 18.Stanley, M.S., Busch, K.L., Vincze, A.: Imaging analysis of thin layer chromatograms by secondary ion mass spectrometry: analysis of neostigmine and pyridostigmine bromides. J. Planar Chromatogr.–Mod. TLC 1, 76–78 (1988)Google Scholar
- 21.Parry, R.M., Galhena, A.S., Gamage, C.M., Bennett, R.V., Wang, M.D., Fernandez, F.M.: omniSpect: an open MATLAB-based tool for visualization and analysis of matrix-assisted laser desorption/ionization and desorption electrospray ionization mass spectrometry images. J. Am. Soc. Mass Spectrom. 24, 646–649 (2013)CrossRefGoogle Scholar
- 27.Kaddi, C., Parry, R.M., Wang, M.D.: Hypergeometric similarity measure for spatial analysis in tissue imaging mass spectrometry. Proceedings of the 2011 I.E. International Conference on Bioinformatics and Biomedicine; pp, 604–607 (2011)Google Scholar
- 28.Kaddi, C.D., Parry, R.M., Wang, M.D.: Multivariate hypergeometric similarity measure. IEEE/ACM Trans. Comput. Biol. Bioinform. (2013)Google Scholar