Multivariate Analysis of MALDI Imaging Mass Spectrometry Data of Mixtures of Single Pollen Grains
Mixtures of pollen grains of three different species (Corylus avellana, Alnus cordata, and Pinus sylvestris) were investigated by matrix-assisted laser desorption/ionization time-of-flight imaging mass spectrometry (MALDI-TOF imaging MS). The amount of pollen grains was reduced stepwise from > 10 to single pollen grains. For sample pretreatment, we modified a previously applied approach, where any additional extraction steps were omitted. Our results show that characteristic pollen MALDI mass spectra can be obtained from a single pollen grain, which is the prerequisite for a reliable pollen classification in practical applications. MALDI imaging of laterally resolved pollen grains provides additional information by reducing the complexity of the MS spectra of mixtures, where frequently peak discrimination is observed. Combined with multivariate statistical analyses, such as principal component analysis (PCA), our approach offers the chance for a fast and reliable identification of individual pollen grains by mass spectrometry.
KeywordsMALDI imaging MS Pollen grains Multivariate statistics Hierarchical cluster analysis Principal component analysis
Recent techniques for the determination of pollen are predominantly based on the microscopic evaluation of pollen grains collected in so-called pollen traps, such as the Burkard spore sampler . Usually, pollen and other airborne particles, such as fungus spores, are collected on an adhesive, transparent polyester tape mounted on a drum with a fixed circumference for 7 days. However, the microscopic evaluation requires skilled employees who investigate only a few randomly selected regions of the tape. Moreover, the sampling is strongly influenced by local environmental effects, e.g., by single bushes and trees in the close vicinity of the trap. Since the number of pollen traps is limited (e.g., only 51 traps in Germany), a global underestimation of pollen species must be considered. Combined with the microscopic determination, these data represent the basis for establishing calendars used for pollen forecasts.
Thus, alternative techniques for a reliable and fast identification of pollen grains based on spectroscopic and spectrometric methods are desirable. The use of Fourier-transform infrared spectroscopy (FTIR), Raman spectroscopy, and surface-enhanced Raman scattering (SERS) for the characterization of pollen samples was reported recently by several groups [2, 3, 4, 5, 6, 7]. A classification of pollen species could be obtained in combination with multivariate statistics such as hierarchical cluster analysis (HCA), partial least squares regression (PLSR), and principal component analysis (PCA) [4, 8, 9]. Autofluorescence represents another technique that has been successfully applied for the classification of different pollen species . Specifically in combination with morphological properties (e.g., size), the identification of various species based on the blue/red autofluorescence ratio was described .
In the last years, our group established MALDI-TOF MS as a new technique for the identification and classification of pollen grains [12, 13, 14]. Different from MALDI-TOF MS approaches that investigated purified pollen compounds, including lipids, proteins, glycans, saccharides, and sporopollenin for structure elucidation [15, 16, 17, 18, 19, 20, 21, 22, 23], MALDI-based pollen species identification does not require extraction, chemical modification, or chromatographic separation of particular molecular species and can therefore be regarded as a simple and fast technique . Furthermore, the applicability of this technique for the characterization of different species in mixtures consisting of larger amounts of pollen grains was shown . In combination with PCA, we identified specific mass regions and peak patterns to determine pollen genus. Moreover, MALDI-TOF MS enables also the differentiation of various pollen species within the same genus . Initially, our MALDI-TOF MS approach required separate fixation and extraction steps followed by adding the matrix. This still time-consuming preparation was then modified to simplify the pollen preparation for MALDI-TOF analysis by fixing pollen grains on the target with an adhesive carbon tape and combining the extraction agents with the matrix to further reduce the preparation steps . That the conductive tape is also applicable in imaging MS of plant pieces was shown by Kuwayama et al. .
Here, we present an advanced approach based on this simplified sample preparation procedure  combined with imaging mass spectrometry. Various numbers of pollen grains (from < 10 to 1) were deposited on a carbon tape and covered by matrix. To determine the sensitivity of the mass spectrometer, mixtures of three different pollen grains with different numbers of grains of each species were analyzed by MALDI imaging MS. Since peak suppression in MALDI-TOF mass spectra of mixtures should be considered, our approach compared two different spatial resolutions [26, 27, 28, 29]. As we will demonstrate here, it is possible to combine MALDI imaging of single pollen grains with chemometric tools in order to improve classification and identification based on a database of MALDI-TOF pollen mass spectra. This database meanwhile contains MALDI mass spectra of several hundred species of different orders (8) and genera (53). Thereby, we utilize the potential of multivariate tools to exploit the full molecular range [30, 31] for imaging of pollen grain mixtures.
Three pollen samples (Corylus avellana, Alnus cordata, and Pinus sylvestris) of two plant orders (Coniferales and Fagales) were collected in the Botanic Garden of Berlin. The pollen grains were stored at – 20 °C until usage. The size of ten pollen grains of each species was determined using a microscope camera (Olympus UC 90). Photomicrographs with a magnification of 20 and 100 are shown in the suppl. part in Figure S1a–c. Their statistical evaluation is shown in Figure S2.
In the first experiment, a different amount of pollen grains (< 10 to 1) was deposited separately on an MTP 384 standard target that was previously covered with a double-faced adhesive carbon tape (P77817, Science Services GmbH, Munich, Germany). In the second experiment, mixtures of pollen grains (Corylus, Alnus, and Pinus—each with 10, 5, 3, and 1 pollen grains) were prepared on the tape. For each experiment, 0.5 μL α-cyano-4-hydroxycinnamic acid (HCCA) served as matrix (10 mg mL−1 in acetonitrile:water (1:1, v:v) containing 1.25% trifluoroacetic acid) and was spotted onto the pollen grains. After solvent evaporation, the target was inserted into the mass spectrometer.
For all measurements, an Autoflex speed MALDI-TOF mass spectrometer (Bruker Daltonik GmbH, Bremen, Germany) equipped with a Smartbeam-II laser (355 nm) was used . In the first experiment, spectra were recorded in positive linear mode by accumulating 5000 laser pulses in a mass range of m/z 2000 to 12,000.
To check the possible influence of pollen grain sizes on a mass shift of the peaks, an additional test was performed using a MALDI instrument (rapifleX®, Bruker Daltonik) equipped with a probe stage of alterable height. A peptide standard mixture (Bruker Daltonik) was measured. The variation by 25 μm resulted in a mass shift of 0.2 Da for somatostatin (m/z 3147.47) (shown in the suppl. part in Figure S3).
For the imaging experiments, the manufacturer’s FlexImaging™ software was used for acquisition. One thousand laser pulses were accumulated for each spot. Two different pixel sizes (similar to a spatial resolution) of 50 and 100 μm were applied. For better visualization, all images were additionally evaluated using multivariate methods.
Spectral pretreatment and multivariate analysis were performed using Matlab (Version R2015a, Mathworks, Inc., Natick, MA, USA). Therefore, we compiled a sequence of standard functions in Matlab and its Statistics and Machine Learning toolbox, specifically interp1 (for interpolation), baseline (baseline correction), pdist and linkage (for HCA), and princomp (for PCA), respectively. Raw spectra were interpolated in the mass region between m/z 2000 and 12,000 with a step size of 2 followed by baseline correction and vector normalization. Data sets were analyzed by HCA using Euclidean distances and Ward’s algorithm. The number of clusters was manually chosen based on a heterogeneity value (distance linkage) between 1.3 and 1.5. As another approach to assess spectral differences, PCA was applied to gain more information from the mixture images.
Results and Discussion
MALDI-TOF MS of Single Pollen Grains
As shown in Figure 1, for each plant species (Corylus avellana, Alnus cordata, and Pinus sylvestris), mass spectra with individual peak patterns in different mass ranges were obtained. The spectra of Corylus avellana (Figure 1a) show intense peaks at m/z 2944 and 4106 and a specific peak pattern between m/z 5200 and 6894. The spectra of Alnus cordata (Figure 1b) reveal four intense peaks at m/z of 3444, 5210, 5602, and 6884. Compared to the spectra of these species, the spectra of Pinus sylvestris (Figure 1c) show less intense peaks and most peaks can be found in the lower mass region (m/z 3947–4658) of the spectra. These results are in good agreement with our previously published data [12, 14, 24]. In one of these publications, we tried to elucidate the structure of such molecular classifiers by MALDI-TOF MS/MS fragmentation analysis. Typical patterns were obtained that could be attributed to sugar units (e.g., hexose, fucose, pentose, and sialic acid). However, the fragmentation of precursor ions with masses larger than m/z 3000 (which represent the majority in the pollen spectra) was not feasible . In a detailed study, Fraser et al. were able to identify carotenoids by MALDI-TOF MS in a mass range from m/z 530 to 548 . Higher mass sporopollenin was found in a m/z region of 4400–4800 also using MALDI-TOF MS . A more detailed analysis of other molecular classifiers (e.g., glycans, lipids, peptides, and proteins) would require additional biochemical analyses (e.g., using tryptic digestion) and the use of LC-MS/MS techniques.
For the Corylus avellana (Figure 1a) and Alnus cordata (Figure 1b) samples, even one single pollen grain shows the same characteristic peak patterns, whereas in accordance with Pinus sylvestris (Figure 1c), spectra show less intense and less characteristic signals. With increasing amount of pollen grains, the spectra display an increase of peak intensities accompanied by a decrease of the background noise. The noisy signals especially in the low mass range might be the result of matrix and tape interference. Moreover, additional small peaks are visible in the higher mass range from m/z 9000 to 12,000 using higher amounts of pollen grains (> 10 pollen grains).
Thus, the pollen of Corylus avellana (Figure 1a) and Alnus cordata (Figure 1b) show species-specific peak patterns at single pollen level. Nevertheless, the peaks of the averaged spectra of all three species were applied as a reference for the following imaging experiments.
MALDI-TOF Imaging MS of Pollen Grain Mixtures (Resolution 100 μm)
In the following part, a set of pollen grain mixtures of the three species each with 10, 5, 3, and 1 pollen grains was investigated by MALDI imaging MS applying a spatial resolution of 100 μm. Thereby, the spatial resolution was set by the spot-to-spot distance. A direct visualization of single pollen grains in the pictures was not feasible since grain sizes were too small and the glue used on the conductive tape reflected too much. Another reason was the coverage of the grains by a droplet of matrix. This step can also lead to a dislocation of the pollen grain within the droplet.
Considering the annotated peaks of the spectra used to generate cluster 1 (Figure 2b, violet), this cluster contains information from all three pollen species. In contrast, more differences can be seen between the averaged spectrum in cluster 2 and cluster 3. The averaged spectrum in cluster 2 shows a more species-specific peak pattern for Alnus, while the spectra in cluster 3 are more similar to pollen spectra of Corylus. By comparing these results with the HCA image (Figure 2c), an enrichment of Alnus pollen grains can be assumed in the center of the image (red), whereas Corylus pollen grains (blue) are more located towards the right-hand side. Cluster 4, which mostly consists of spectra with low signal-to-noise ratios (Figure 2b), represents a transition region between pollen extract and background (Figure 2c, dark gray).
In the following experiments, the amount of pollen grains was decreased. The results of the evaluation of a mixture composed of five grains of each species are shown in the suppl. part in Figure S4. The spectra of the clusters 1, 2, and 4 (Figure S4b) show species-specific peak patterns (blue, violet, and red), whereas cluster 3 mostly contains noisy signals. Different specific peaks can be identified comparing the spectra of cluster 1 and 4, whereas cluster 2 contains a mixture of spectral information of the two other clusters. The averaged spectrum of cluster 1 is similar to the Corylus reference spectra (see Figure 1a). In the averaged spectrum of cluster 2, the peak at m/z 4106 can be attributed to Corylus species (see Figure 1a). Simultaneously, the peak at m/z 4660 can be attributed to Pinus (see Figure 1c), and peaks at m/z 3446 and m/z 6888 match well with peaks of the Alnus reference spectra (see Figure 1b). Finally, the average spectrum of cluster 4 coincides with the reference Alnus cordata spectra.
From the corresponding image (Figure S4c), we conclude that the pollen grains are mainly distributed in the bottom part of the sample. Here, the Corylus pollen extract is located on the left-hand side of the image (blue), while a very localized distribution (400 × 200 μm) of Alnus pollen (red) could be detected next to the Corylus pollen extract. Further, an area on the right-hand side is classified as the above described mixture of pollen extracts (cluster 2).
The data of an experiment conducted with a further decreased pollen grain number (3 of each) is shown in Figure S5. Figure S5a contained 115 dots, which represent one mass spectrum each. Due to the recorded molecular information of image spectra, the individual spectra were assigned to four clusters (Figure S5b and S5c) using HCA. Compared to the previous imaging experiment (Figure S4), fewer differences are present in the averaged spectra (Figure S5b). Cluster 1 contains mostly background signals, whereas in the spectra of cluster 2, many different peaks are present that can be particularly found with higher intensities in cluster 3 too. Finally, in the averaged spectrum of cluster 4, a peak pattern in the mass range m/z 4500–5000 is visible. As indicated by the reference spectra in Figure 1, the annotated peaks of cluster 2 mark a transition region between the pollen grains and the background, while cluster 3 can be assigned to Corylus avellana spectra (compare Figure 1a), and we assume that in the blue region (400 × 300 μm) of the Figure S5c image (cluster 3), Corylus pollen grains are present.
Another experiment conducted with only one pollen grain of each species is depicted in Figure S6. The averaged HCA spectra of the four clusters (Figure S6b) show high similarities, which made a differentiation of different pollen species more complicated. In most cases, the annotated peaks do not match with the peaks of the reference spectra (Figure 1).
So far, we have shown that MALDI imaging of pollen grain mixtures is possible, and spectra with reliable peak information can be obtained. Recorded peak patterns can be differentiated by HCA and visually assigned to reference spectra. Furthermore, we proved that HCA of MALDI imaging MS experiments is a suitable tool for a higher amount of pollen (> 5 pollen grains) but failed in the analyses of single pollen. Here, a crucial point is the separation of measurement data of pollen grains, pollen extract, and background clusters.
As recently reviewed, the spatial resolution in imaging MS is important in order to determine the quality of molecular pictures . In order to collect as much information from the extracts around the pollen grains as possible and to better distinguish between pollen grains of different species adjacent to one another, the lateral resolution was reduced to 50 μm. Furthermore, for analysis of the data, PCA was applied in addition to classification by HCA.
MALDI-TOF Imaging MS of Pollen Grain Mixtures (Resolution 50 μm)
As shown in Figure 3b, c, the 631 spectra were obtained of that mixture, formed in total of nine clusters in an HCA. All clusters, which contained noisy spectra (535 of 631), were pooled (cluster 1–5, Figure 3b). The second group in Figure 3b summarizes information from two individual clusters (cluster 6 + 7) that showed spectral similarities. Here, a mixture of peak patterns of different species as well as noisy signals could be observed. Furthermore, two clusters with peaks characteristic of Corylus (cluster 8) and Alnus (cluster 9) can be distinguished by HCA (Figure 3b).
Figure 3c shows the corresponding HCA image, indicating the assignment of each spectrum to one of the clusters. Here, the positions of the background spectra are depicted in gray (clusters 1–5). The positions of Corylus pollen spectra were colored in blue (cluster 8), whereas the spectra assigned to Alnus pollen grains are shown in red (cluster 9). The violet regions were obtained from spectra of clusters 6 and 7, where distinct species-specific information were difficult to assign. These regions are supposed to contain diluted extracts of several pollen species.
The suitability of combining MALDI imaging MS data with a combination of HCA and PCA to obtain 2D and 3D information of tissue material was initially reported by Deininger et al. and Weaver et al [36, 37]. Therefore, in addition to our previously shown HCA (Figure 3a–c), a PCA was performed on the same data set. A similar data presentation of PCA score plots was shown by Eijkel et al. and Amstalden van Hove et al. [38, 39]. With respect to our data, the loadings of the first three principal components (PC) are shown in Figure 3d, and in Figure 3e, the appropriate positive and negative images of PCA scores are displayed. Additionally, within the loading plots (Figure 3d), the percentages of the total variance represented by each component are given.
In the loading plots (Figure 3d) of PC 1 (negative loading values) and PC 2 (positive loading values up to m/z 5000 and to some extent also the negative loading values in the spectral region up to m/z 2500), peak pattern can be found, which consist mainly of background signals. In contrast, the positive and negative loading signals of the PC 3 show species-specific signals. Here, positive loading values, such as peaks at m/z 3446, 5212, and 6886, can be assigned to Alnus, whereas the negative loading values at m/z 2942, 4100, and 6108 are specific for Corylus spectra. Thus, the PC 3 positive and negative score images in Figure 3e show two different regions, where either Alnus or Corylus pollen grains occurred.
A comparison of this PCA image (Figure 3e, PC 3) to the HCA image (Figure 3c) shows similar results. A separation of Corylus and Alnus pollen spectra by MALDI-TOF imaging could be achieved by both HCA and PCA. However, the identification of Pinus pollen grains in this mixture was neither possible using HCA nor PCA.
A PCA applied to the same data set is depicted in Figure 4d, e and gave similar results. The loading vector of PC 1 (Figure 4d, first plot) describes the main variance in the data set, with the negative contributions representing background signals and the positive loading signals combine several different peak patterns. Thus, more interesting are the differences within these patterns, which showed up in PC 2 and PC 3. The positive region of the PC 2 loading (Figure 4d, middle) is similar to the positive region of PC 1 with just one exception. The peak pattern around m/z 4734 produced negative loading signals now. Moreover, the loading of PC 3 (Figure 4d, bottom) revealed the previously described difference between specific peaks for Alnus (at m/z 3444, 5210, and 6884) and Corylus (at m/z 2946, 4106). The corresponding score values, shown in the PC 3 positive and negative score images of Figure 4e, indicate the lateral position of the Alnus and Corylus pollen grains. These areas can be found at the same place of the sample as the corresponding assignments in the HCA image (Figure 4c).
The PCA of this data set is displayed in Figure 5d (loadings of the first three principal components (PC)) and Figure 5e (corresponding images of the PCA scores). Here again, the loading of PC 3 (Figure 5d) distinguished peaks specific for Corylus spectra (as positive loading signals) and peaks specific for Alnus spectra (negative loading signals). The position of both pollen grain species on the target, given by the score images of PC 3+ and − (Figure 5e), corresponds well to the HCA image (Figure 5c).
Identification of Pollen Grains in Mixtures Using Independent Reference Samples
Based on this differentiation and similar to Jansen et al. (POCHEMON), the 631, 870, or 622 spectra, recorded in the previously shown imaging runs (Figures 3, 4, and 5), were projected into the reference PCA (Figure 6c–e, top, violet dots) . Subsequently, constant threshold values (scattered lines) were set. The blue region (Corylus) was limited by score values up to − 0.2 of PC 1, the red region (Alnus) started at score values above 0.1 of PC 1, and the yellow region (Pinus) was determined by score values higher than 0.3 for PC 2. Afterwards, this threshold information was transferred into the acquired images (Figure 6c–e, bottom). The automatic assignment of the PCA projections shown in Figure 6c–e completely supports the results obtained using HCA and PCA (Figures 3, 4, and 5). Individual pollen information can be observed at identical places in the images. Additionally, in the data set of the pollen grain mixture (three of each), spectra belonging to Pinus could be identified (Figure 6d, yellow pixels).
The sensitivity of MALDI-TOF mass spectrometry is sufficient for the analysis of single pollen of various species. Moreover, MALDI imaging of pollen mixtures has been shown to be a suitable approach to detect and identify individual pollen grains in mixtures. We could also demonstrate that better coverage of the selected imaging region, which could be achieved by higher spatial resolution (smaller pixel size), was necessary for a higher sensitivity. Since the image spectra contained complex information, multivariate evaluation was essential for a successful separation and identification. Using hierarchical cluster analysis (HCA), spectra can be divided into clusters based on overall spectral differences. The visual comparison of the average cluster spectra with separately measured pollen spectra provided the first details on the different spectral features. The localization of the pollen grains was achieved by combining the cluster information with the respective spatial position of each spectrum. Performing PCA supported and confirmed the assignments. Further, the classification of recorded imaging spectra with respect to reference spectra in a variance-weighted PCA space additionally enabled an independent identification of the pollen grains in the mixtures.
The authors thank Thomas Dürbye of the Botanic Garden and Botanical Museum Berlin-Dahlem for their support in sample collection.
Janina Kneipp received funding from the European Research Council (ERC) (grant no. 259432).
- 15.Adhami, F., Leitzenberger, I., Wagner, S., Scheiner, O., Breiteneder, H.: Recombinant hevein and hevein-like domains from Hevea latex, avocado and banana bind cross-reactive IgE from latex-allergic patients. Allergy. 57, 82–83 (2002)Google Scholar
- 21.Raftery, M.J., Saldanha, R.G., Geczy, C.L., Kumar, R.K.: Mass spectrometric analysis of electrophoretically separated allergens and proteases in grass pollen diffusates. Respir. Res. 4, (2003)Google Scholar
- 22.Westphal, S., Kolarich, D., Foetisch, K., Lauer, I., Altmann, F., Conti, A., Crespo, J.F., Rodriguez, J., Enrique, E., Vieths, S., Scheurer, S.: Molecular characterization and allergenic activity of Lyc e 2 (beta-fructofuranosidase), a glycosylated allergen of tomato. Eur. J. Biochem. 270, 1327–1337 (2003)CrossRefGoogle Scholar