Introduction

Mass spectrometry imaging (MSI) is a label-free technology that allows to obtain molecular and spatial information from intact tissue sections [1]. MSI has been gradually adopted for spatial-resolved metabolomics and it has been regarded as a potential tool for understanding the mechanisms underlying complex diseases such as cancer or diabetes [2]. However, the conventional organic matrices used in Matrix-Assisted Laser Desorption Ionization (MALDI) produce spectral signals that interfere in the low m/z range. This is an issue particularly in metabolomics which analyses low molecular weight compounds, so mass spectrometers are set to acquire within the m/z range where MALDI matrices exhibit most MS signals. This seriously hampers downstream metabolomics data processing [3, 4], as the matrix introduces noise, redundant variables, and variables with no biological meaning into the complex MSI datasets.

Several alternatives to the common organic matrices have been proposed to deal with exogenous contamination caused by matrix ion signals. Nanomaterials or metal layer deposition methods, for instance, dramatically reduce the number of signals related to the LDI promoting material in the low m/z range. Some examples are graphene oxide, silicon or metals such as gold, platinum or silver [5,6,7,8]. Nevertheless, even when these alternatives are used and the number of peaks related to the LDI promoting material is reduced, there is still a need to annotate them in order to reduce spectral complexity and distinguish exogenous from endogenous compounds, especially in untargeted applications.

To tackle the issue of annotating MS signals related to the LDI-promoting material several software-based solutions have been proposed. A simple approach consists of acquiring a reference area outside the sample during the MSI experiment. Under the assumption that only matrix-related peaks will be recorded, the peaks found in the outside area are then subtracted from the tissue spectrum. Given its simplicity, some variation of this procedure has been adopted by many researchers in their workflows. Expanding on this idea, Fonville et al. [9] presented a method that relies on the hypothesis that matrix-related peaks will correlate positively to a set of reference peaks outside the tissue region while endogenous peaks will correlate negatively. However, this approach has three main limitations. Firstly, due to ion suppression [10] and the formation of matrix adducts with endogenous compounds, the matrix-related peaks outside and inside the tissue region might differ. Additionally, endogenous molecules that are delocalized during the matrix application process can be misclassified as matrix-related. Finally, the method cannot distinguish a given matrix-related MS peak from an isobaric or overlapping endogenous MS peak. Thus, simplified approaches to annotate matrix-related signals are not suitable for untargeted applications such as spatial metabolomics. Recent work by Ovchinnikova et al. [11] takes a more comprehensive approach in defining three automated algorithms for off-sample ion classification. Their methods have proved to perform well when trained and validated against a “gold standard set” of ion images manually annotated by experts. However, their focus is not specifically on matrix-related peaks, but on the annotation of signals that exhibit a spatial distribution with high concentrations outside of the tissue region. For this reason, these methods focus on classifying each ion image separately as “on-sample” or “off-sample” and do not exploit relevant information such as the identity of the ion, adduct type, matrix type, etc. Additionally, since they are based in machine and deep learning methods they inherently suffer from the black box problem given that annotation results cannot be traced back and easily justified.

To solve these limitations we propose a new algorithm that relies not only on the ion images but also on the chemical information of the LDI promoting material used. The algorithm also incorporates an overlapping peak detection feature to prevent misclassification of overlapped or isobaric ions. The presented algorithm is implemented in an open-source R package freely available to facilitate its use. Additionally, the package generates a visual report to transparently justify each annotation.

In order to validate and optimize the proposed method, we opted for a well-understood LDI promoting material such as silver. The use of silver nanolayers for MSI (AgLDI MSI) has been steadily growing in recent years [6, 12,13,14,15,16,17]. The characteristic isotopic pattern of silver (\({}_{ }^{107} Ag\) and \({}_{ }^{109} Ag\), 51.84% and 48.16% abundance, respectively), as well as its well-known ionization and adduct formation allow to define a list of possible and not-possible silver-related peaks of a typical AgLDI MSI experiment. This set of possible and not-possible peaks is used as a validation list to assess the performance of the classification algorithm. A total of 14 MSI datasets acquired with an Ag-sputtered nanolayer from three different laboratories, were used for validation.

Materials and methods

Table 1 summarizes the main processing parameters for each of the 14 datasets used in this study. Datasets 1–10 were acquired in our lab and the materials, sample preparation and MSI acquisition parameters are described here. In order to overcome lab-specific bias in our study, four additional datasets were provided by collaborating laboratories. For further details about the materials, sample preparation and MSI acquisition of these datasets, refer to the original publications of Dataset 11 [18], Dataset 12 [14] and Datasets 13 and 14 [6].

Table 1 List of the 14 AgLDI MSI datasets used for validation

Materials

For the samples acquired by our group, indium tin oxide (ITO)-coated glass slides were obtained from Bruker Daltonics (Bremen, Germany). The silver-target (purity grade > 99.99%) used for sputtering was acquired from Kurt J. Lesker Company (Hastings, England).

Sample preparation

All the samples acquired by our group were obtained from mice and provided by the animal facility at the Faculty of Medicine and Health Sciences of the University Rovira i Virgili. All tissues were snap-frozen at − 80 °C after collection and kept at this temperature during shipping and storing until MSI acquisition.

The tissues were sectioned with a Leica CM-1950 cryostat (Leica Biosystems Nussloch GmbH) located at the Centre for Omics Sciences (COS) of the University Rovira i Virgili into 10 μm sections. Tissue sections were mounted on ITO coated slides by directly placing the glass slide at ambient temperature onto the section.

The sputtering system ATC Orion 8-HV (AJA International, N. Scituate, MA, USA) was used to deposit a silver nanolayer onto each tissue section. An argon atmosphere with a pressure of 30 mTorr was used to create the plasma in the gun. The working distance of the plate was set to 35 mm. The sputtering conditions were ambient temperature using DC mode at 100 W for 10 s. With these parameters, an Ag layer thickness of roughly 5 nm was obtained. The deposition times were short to prevent the substrate temperature from increasing excessively and, consequently, degrading metabolites.

LDI-MS acquisition

A MALDI TOF/TOF ultrafleXtreme instrument with SmartBeam II Nd:YAG/355 nm laser from Bruker Daltonics available at COS was used for MSI acquisition. Acquisitions were carried out by operating the laser at 2 kHz and collecting a total of 500 shots per pixel.

The TOF spectrometer was operated in positive ion, reflectron mode, in m/z ranges according to Table 1. The spectrometer was calibrated prior to MSI data acquisition using \(\left[ {Ag} \right]_{n}^{ + }\) cluster peaks as internal reference masses.

MSI data processing

The raw spectral data of each MSI dataset was exported to the imzML data format [19] in profile mode. The software rMSIproc [20] was used to process the data and generate a peak matrix in centroid mode. The default processing parameters were used. The Signal-to-Noise Ratio (SNR) threshold was set to 5 and the Savitzky–Golay smoothing had a kernel size of 7. Peaks appearing in less than 5% of the pixels were filtered out. Peaks within a window of 6 data-points or scans were binned together as the same mass peak. Mass spectra were re-calibrated using the Ag reference peaks as reference masses [21].

Datasets 13 and 14 were acquired in centroid mode with an Orbitrap mass spectrometer. These datasets were directly submitted to the binning process of rMSIproc [21] to conform to the peak matrix format.

No data normalization was performed. Data were visualized and explored using rMSI [22].

Algorithm description

Input and output format

The matrix-related annotation algorithm takes the peak matrix in centroid mode and the processed spectral data in profile mode as input. The user must also provide the chemical formulae of the matrix applied and a list of possible adducts and neutral losses to consider. The choice of adducts and neutral losses to consider is purely application dependent and is therefore left to the user.

The algorithm produces a vector containing the similarity scores that indicate the likelihood of each mass in the input image being a matrix-related ion. The package also provides an informative visual report for the user to understand the justification behind the classification. Additional file 1: Figures S1–S4 show examples of the visual report.

In-silico cluster and adduct calculation

The theoretical mass and relative isotopic pattern intensities of all possible matrix-related silver clusters (“\(Ag_{n}^{ + }\) theoretical cluster” in this work) are calculated using the open-source package enviPat [23], a fast and memory-efficient algorithm to compute theoretical isotope patterns.

For each \(Ag_{n}^{ + }\) theoretical cluster \(t_{i}\) its experimental counterpart \(e_{i}\) is obtained from the mean spectra of the dataset. The experimental masses closest to the theoretical ones within a given tolerance specified by the user are used. The \(Ag_{n}^{ + }\) theoretical clusters will then be matched against their experimental counterparts and their presence in the experimental dataset assessed using two similarity metrics. In the event of finding more than one experimental match within the tolerance, all possible options will be evaluated. The experimental cluster \(e_{i}\) with the highest similarity metrics is selected. From our experience, this is an unlikely event mostly associated with an exceedingly high tolerance threshold.

Similarity metrics

The similarity between each theoretical matrix-related cluster and experimental clusters is assessed using two similarity scores according to Eq. 1.

$$S = S_{1} \cdot S_{2}$$
(1)

where \(S\) is the total similarity score, \(S_{1}\) is the cluster spectral similarity (i.e. similarity between the experimental and theoretical isotopic intensity patterns) and \(S_{2}\) is the intra-cluster morphological similarity (i.e. similarity between the spatial distribution of the experimental ions). Both similarity scores range from 0 to 1.

The cluster spectral similarity score \(S_{1,i}\) for theoretical cluster \(t_{i}\) determines the degree of similarity between the scaled intensity vectors of intensities \(I_{{t_{i} }}\) and \(I_{{e_{i} }}\) and it is computed according to Eq. 2.

$$S_{1,i} = e^{{ - dist\left( {\frac{{I_{{t_{i} }} }}{{\hbox{max} \left( {I_{{t_{i} }} } \right)}},\frac{{I_{{e_{i} }} }}{{\hbox{max} \left( {I_{{e_{i} }} } \right)}}} \right)}}$$
(2)

where \(dist\left( {a,b} \right)\) is the distance function chosen by the user (Euclidean distance by default), \(I_{{t_{i} }}\) is the vector of intensities of the theoretical cluster \(t_{i}\) and \(I_{{e_{i} }}\) is the vector of intensities of experimental cluster \(e_{i}\). Experimental cluster \(e_{i}\) is determined by accessing the element in the peak matrix with a mass corresponding to \(t_{i}\) within a given tolerance. In plain terms, \(S_{1}\) is a decaying exponential function of the distance between the intensity scaled intensity vectors \(I_{{t_{i} }}\) and \(I_{{e_{i} }}\).

The intra-cluster morphological similarity \(S_{2,i}\) returns the degree of similarity between the spatial distributions of the ions conforming the experimental cluster \(e_{i}\). Ions with a high spatial correlation are more likely to belong to the same cluster. This metric is computed using Eq. 3.

$$S_{2,i} = \frac{{I_{{t_{i} }}^{'} \cdot I_{{t_{i} }} \cdot correl\left( {Images_{{e_{i} }} } \right)}}{{(\sum I_{{t_{i} }} )^{2} }}$$
(3)

where \(I_{{t_{i} }}\) is the intensity vector of the theoretical cluster \(t_{i}\), \(correl\left( A \right)\) is the correlation function specified by the user (Pearson correlation by default) and \(Images_{{e_{i} }}\) is the set of images corresponding to each ion in the experimental cluster \(e_{i}\). In plain terms, \(S_{2}\) is the weighted mean across both directions of the correlation matrix between each ion image in \(e_{i}\).

Overlapping peak detection

Insufficient resolving power leads to overlapped MS signals, which can be a severe problem in matrix-related peak annotation as they can lead to a greater number of misclassified peaks. This is a particularly limiting issue in lower resolution spectrometers such as some TOFs in contrast to higher resolution analysers such as Orbitrap or FTICR [24]. An additional problem with the same effect is the intrinsic inability of mass spectrometry to distinguish between isobaric species. In order to cope with these issues, we propose an overlapping detection algorithm capable of determining if a given MS signal corresponds to more than one overlapped ion peaks.

The overlapping detection algorithm is only executed in those clusters that report S1 and S2 scores under a threshold specified by the user. Before concluding that the cluster is not present, the algorithm determines whether the low similarity metrics could be attributed to the presence of overlapped signals.

The algorithm is based on the operating principle of bisecting k-means [25]. All the ions in an experimental cluster \(e_{i}\) are split into two subgroups (\(e_{i:1}\) and \(e_{i:2}\)) based on the correlation of their spatial distributions using k-means. For each subgroup of ions the similarity metrics S1 and S2 are recomputed. If the S1 and S2 scores of a given subgroup surpass the specified threshold, all ions in the subgroup are tagged as matrix-related. The remaining ions in \(e_{i}\) are tagged as matrix-related but suffering from overlapping, and the overlapping detection algorithm terminates. If instead, none of the subgroups obtains an S1 and S2 above the threshold, the process of splitting into two subgroups by k-means and recomputing the similarity scores is repeated for both \(e_{i:1}\) and \(e_{i:2}\). This bisection of the ions in \(e_{i}\) is repeated iteratively until a subgroup obtains S1 and S2 scores above the threshold. To prevent overfitting, the iterative process will also stop when the number of peaks contained by the biggest subgroup becomes smaller than half the amount of peaks in \(e_{i}\). In such event, it is concluded that there are no overlapped peaks and all ions in the experimental cluster \(e_{i}\) are tagged as not-matrix-related. To sum, overlapped MS signals will be detected and distinguished from the rest of the ions in the cluster based on the dissimilarity of their spatial distributions.

Results

Algorithm validation with AgLDI MSI

In order to validate and optimize the algorithm, we opted to use sample tissues covered by silver nanoparticles, a well-defined and understood LDI promoting material. A total of 14 datasets, from 3 different laboratories, were used. The datasets included several animal tissues, plant tissues and human fingermarks.

The algorithm was challenged with the task of classifying a list of silver-containing compounds and adducts for each dataset. The list includes a “positive class” formed by clusters that should be present in all samples used in this study and a “negative class” containing clusters that should not be present in any of them. This list is referred to as “validation list” and allowed us to assess the performance of the algorithm. An algorithm with a perfect performance should classify all clusters in the “positive class” as matrix-related signals and all clusters in the negative class as not present and thus not-matrix related. This is a common approach in bioinformatics for validating and assessing the performance of a classifier algorithm [26]. Table 2 shows the complete validation list.

Table 2 “Validation list” used for validation

Silver clusters containing up to 60 atoms have been reported to form during silver sputtering [27]. The “positive class” expected to be found in all datasets is therefore formed by all silver clusters within the acquired mass range. For most of the datasets, this includes clusters from \(Ag_{1}^{ + }\) to \(Ag_{10}^{ + }\). Given the high heterogeneity in adduct formation of the samples used (i.e. the possibility of biological compounds from the tissue to form adducts with silver cations), no silver adducts were included in the “positive class”.

The “negative class” consists of silver compounds or adducts that should not be present in any of the samples used in this study. Firstly, this list includes various silver neutral salts which cannot be measured using LDI MSI, and some synthetic compounds that are not expected to be present in animal or plant samples [28]. It also includes compounds found in aerial parts of plants, wax and insects (not found in mammal tissues nor in corn root) that have been reported to form adducts with silver in AgLDI MSI applications [29]. For each of these molecules, we also included all clusters within the acquired mass range. These particular molecules and their clusters were selected in an attempt to have a “negative class” covering the full mass range.

Performance of similarity scores

Using the validation list described in “Algorithm validation with AgLDI MSI” section, we assessed the performance of the similarity scores as a classifier to annotate \(Ag_{n}^{ + }\)-related peaks in AgLDI MSI datasets.

Figure 1 shows the similarity scores obtained for each cluster in Table 2 when searched in all 14 datasets from Table 1. The blue points represent the “positive class” (clusters that should be present) while the red points represent the negative class (clusters that should not be present). The tolerance threshold was set to 4 data-points or scans.

Fig. 1
figure 1

Similarity scores performance a Spectral similarity S1 vs. Intra-cluster morphological similarity S2 scatter plot. Each point represents a potential cluster classified by the algorithm. All clusters shown in Table 2 are evaluated for all 14 datasets presented in Table 1. Blue points represent the “positive class” (should be present in the sample) while the red points correspond to the negative class (should not be present in the sample). Most “positive class” points are located in the top right corner well separated from the negative class points. This indicates proper classification power. b Precision and recall (PR) curve computed according to Davis et al. 2006 [30]. c Similarity score S1·S2 vs. Cluster number. Clusters are arranged in decreasing order of mean similarity score. A clear gap between an S of 0.5 and 0.7 separates the “positive class” from the negative class. Refer to Additional file 1: Table S1 for a mapping of cluster numbers to cluster chemical formula

Figure 1a represents the spectral similarity score (\(S1\)) against the intra-cluster similarity score (S2) of each of these clusters. The “positive class” is clearly separated on the top right corner (high \(S1\) and high \(S2\)).

To evaluate the classifying performance of the two similarity metrics we use the Precision vs. Recall (PR) curve [26]. The precision is defined as the ratio between the number of clusters in the “positive class” classified as matrix-related (i.e. true positives) and the total number of clusters classified as matrix-related (i.e. true positives + false positives). The recall, on the other hand, is the ratio between the number of clusters in the “positive class” classified as matrix-related (i.e. true positives) and the total number of clusters in the “positive class” (i.e. true positives + false negatives). Figure 1b shows the PR curves for each of the similarity metrics proposed. The areas under the curve (AUC) of 0.97 and 0.91, respectively, show that the spectral similarity score S1 is the best classifier followed by the intra-cluster morphology similarity score \(S2\). The product of \(S1 \cdot S2\) had the same classifying skill as \(S1\) with an AUC of 0.97. These results prove that \(Ag_{n}^{ + }\)-related peaks can be well classified by these two metrics.

\(S1\) performs much better than \(S2\) as a classifier, and the product of \(S1 \cdot S2\) matches but does not improve the performance of \(S1\) alone. Nevertheless, we still decided to use the product of \(S1 \cdot S2\) as a classifier in rMSIcleanup instead of using \(S1\) alone due to three main reasons. Firstly, the overlapping detection algorithm strongly relies on the morphological similarity of ions and thus depends on \(S2\). Moreover, even though we did not find a single instance of a cluster with a high \(S1\) score and a low \(S2\) score (matching isotopic patterns but unmatching spatial distributions) in any of the samples, we still consider that \(S2\) should be present to allow for correct classification should this occur. Finally, \(S2\) can be a strong asset in applications other than AgLDI MSI where, due to less distinctive isotopic ratios, the performance of \(S1\) as a classifier is diminished.

Figure 1c shows the similarity score S1·S2 obtained by each cluster in all datasets. Clusters are arranged in decreasing order of mean similarity score. Additional file 1: Table S1 maps the cluster numbers to cluster chemical formula. A clear gap between an \(S\) of 0.5 and 0.7 separates the “positive class” from the negative one.

Only three false positives (i.e. clusters that should not be present but have a high \(S\) value) were reported for adduct \([{\text{C}}_{28} {\text{H}}_{58} {\text{O}} + {\text{Ag}}]^{ + }\). An example is shown for Dataset 4 in Additional file 1: Figure S5. Identification by MS/MS is required to assess if the compound is indeed present in the sample. Nevertheless, the mass error between experimental and theoretical isotopic patterns for this compound was 154 ppm, an error much higher than the expected for this dataset (acquired with a TOF MS analyzer). Therefore, we inferred that the experimental pattern detected is not related to adduct \([{\text{C}}_{28} {\text{H}}_{58} {\text{O}} + {\text{Ag}}]^{ + }\) and this is, in fact, a false positive. In order to reduce the number of false positives, the mass tolerance of the algorithm can be decreased, however, a too strict mass tolerance increases the number of false negatives.

A total of six false negatives (i.e. clusters that should be present but have a low \(S\) value) were reported for some datasets for clusters \(Ag_{3}\), \(Ag_{6}\) and \(Ag_{10}\). False negatives correspond to clusters for which the majority of peaks in their isotopic pattern were under the SNR threshold, and thus were excluded during pre-processing. In these cases, the few included peaks were not sufficient to reliably annotate the cluster. Additional file 1: Figure S6 shows the only exception, the \(Ag_{6}\) cluster in Dataset 12, whose misclassification is not due to intensity problems. In this case, the fingerprint analysed showed highly homogeneous ion images, which impedes the proper operation of the overlapping algorithm and leads to misclassification. Representative examples of correct annotations are shown in Additional file 1: Figures S7–S8.

As an additional validation, the results were matched against the published annotations of the datasets provided by external laboratories. Dataset 12 contains 60 identifications by MS/MS [14]. Dataset 13 contains 4 metabolites identified by MS/MS and a total of 10 tentatively identified formulae based on exact mass [6]. Dataset 14 contains 10 metabolites identified by MS/MS and 6 tentatively identified formulae based on exact mass [6]. None of these endogenous signals was misclassified as \(Ag_{n}^{ + }\)-related by our algorithm.

Overlapping peak detection performance

Figure 2 shows a case example where the overlapping peak detection algorithm successfully identified overlapping ions when searching for the \(Ag_{6}\) cluster in Dataset 1. Figure 2a depicts the experimental mean profile spectrum in the mass range of interest along with the calculated profile of the \(Ag_{6}\) cluster. While most peaks follow the calculated isotopic distribution, experimental peaks at m/z 641.43, m/z 643.43 and m/z 653.43 are considerably more intense than in the predicted pattern. This generates a mismatch between the experimental and calculated peaks that leads to a low \(S1\) score. Figure 2b shows the spatial distributions of each of the ions in the \(Ag_{6}\) cluster. The correlation map in Fig. 2d clearly indicates that peaks at m/z 641.43 and m/z 643.43 have a spatial distribution that is unlike that of the rest of the ions in the cluster. The peak at m/z 653.43 also shows a considerably different spatial correlation to the rest. These low correlations lead to a lower \(S2\) score. Figure 2c is a zoom-in of the peaks at m/z 641.43 and m/z 643.43 showing that the silver ion peaks are clearly overlapped with Ag-unrelated signals.

Fig. 2
figure 2

Overlapping detection algorithm performance when searching for the \(Ag_{6}\) cluster in Dataset 1. a Comparison between the mean experimental spectra and the theoretical \(Ag_{6}\) cluster at the \(Ag_{6}\) cluster masses within a tolerance of 4 scans. Red and blue represent theoretical and experimental profiles, respectively. As can be seen, while the peaks in the centre of the cluster perfectly match the theoretical ratios, the peaks on the edges differ considerably. b Spatial distributions of the experimental cluster peaks. After performing the overlapping detection only the four ion images in the centre in green are classified as Ag-related. The remaining ion images in red are classified as \(Ag^{n + }\) suffering from overlapping. The morphologies of the \(Ag^{n + }\) overlapped ions (red) differ from the ones without overlapping (green) due to ion overlapping. c Correlation matrix between the experimental ion images of the \(Ag_{6}\) cluster. The ion image number corresponds to the position of the ion in the isotopic pattern in ascending order of m/z. The first two images are clearly not correlated with the remaining images of the cluster. The last image also shows a considerably lower correlation. d Zoom-in of experimental mean spectra. Peaks m/z 641.43 and m/z 643.43 show clear overlapping

Initially, given the low S1 and S2 scores, all peaks in the \(Ag_{6}\) cluster were misclassified as not \(Ag_{n}^{ + }\)-related. Using the overlapping detection algorithm, the peaks at m/z 645.43, m/z 647.43, m/z 649.43 and m/z 651.43 were correctly tagged as belonging to \(Ag_{6}\). Peaks at m/z 641.43, m/z 643.43 and m/z 653.43 were tagged as related to \(Ag_{6}\) but with overlapping.

Additional file 1: Figure S9 explores the effects of overlapping peak detection on overall performance. Two main differences can be appreciated. Firstly, there is an overall increase in the \(S_{1} \cdot S_{2}\) score obtained by the “positive class” which leads to a bigger gap between the “positive class” and the “negative class” making the thresholding classification more robust. This is due to the identification of some overlapping peaks in the \(Ag_{n}^{ + }\) clusters. Additionally, there is a clear improvement in the scores obtained by the \(Ag_{6}\) cluster. The \(Ag_{6}\) cluster suffers from overlapping in most of the datasets and is, therefore, the cluster most benefitted from the overlapping detection algorithm. It is also important to note that the overlapping peak detection algorithm does not add any false positives as the \(S_{1} \cdot S_{2}\) remains unchanged for the “negative class”. This proves that overlapping detection leads to less misclassification of \(Ag_{n}^{ + }\)-related peaks.

Matrix-related peak annotation improves the post-processing

In order to explore the influence of the annotation and removal of matrix-related peaks in the post-processing workflows, we carried out a multivariate statistical exploratory analysis. The widely used linear algorithm Principal Component Analysis (PCA) [31] was performed on all 14 datasets before and after removal of the \(Ag_{n}^{ + }\) peaks. Given that the features in an MSI experiment have a direct physical relationship [32, 33], prior to PCA the data was centred and no scaling was performed. We then compared the quality of the spatial representation of the first three principal components. Given the lack of a standard quantitative metric to compare the quality of two images in MSI, we followed the trend established by recent work [11, 34, 35] and performed a qualitative visual comparison.

Figure 3 shows the results of this exploratory analysis on Dataset 2 and Dataset 11. In the pancreatic tissue represented in Fig. 3a (Dataset 2), PC1 did not change significantly after matrix removal, while PC2 and PC3 showed a wider variety of morphologies on the tissue after the \(Ag_{n}^{ + }\) interference was removed. In the brain tissue shown in Fig. 3b (Dataset 11) the contrast enhancement is even clearer in the three PCs. Before the \(Ag_{n}^{ + }\) peaks were removed, PC1 and PC3 did not capture any substantial morphology but afterwards, they did and PC2, which already showed morphological information, did so with increased contrast. To convey the three principal components in a single picture we encoded each of them as a colour in the Red Green Blue colour model (RGB). The RGB picture became richer and more informative after the \(Ag_{n}^{ + }\) peaks were removed. Similar results were obtained in the remaining 12 datasets and their corresponding images can be accessed in Additional file 1: Figures S10–S13.

Fig. 3
figure 3

Exploratory analysis with PCA before and after removing matrix-related peaks. Red, green and blue are used to represent the spatial distribution of PC1, PC2 and PC3, respectively. The last column uses the Red Green Blue colour model (RGB) to represent the first three principal components in a single image. The annotation and removal of the matrix-related peaks lead to a generalized improvement in the contrast of morphological structures in all principal components. a Pancreas tissue from Dataset 2. b Brain tissue from Dataset 11 [18]

As shown in Additional file 1: Table S2 and Figure S14 the removed \(Ag_{n}^{ + }\) peaks can represent a substantial fraction of the total number of features. Additional file 1: Figure S15 shows the same PCA analysis compensating for the lower number of features after removing the \(Ag_{n}^{ + }\) peaks by reducing the original dataset to the most intense peaks or by random feature selection. The results still show a clear improvement of the morphological contrast after removal of the \(Ag_{n}^{ + }\) peaks regardless of the feature reduction in the original dataset.

The main conclusion that can be drawn from the visual analysis of these results is that the removal of matrix-related peaks leads to a generalized enhancement in the contrast of morphological structures obtained with the first principal components. This is due to the fact that the variance contribution of the matrix-related signals is not fed to the PCA and therefore the resulting principal components are better focused on the morphology of the tissue. In agreement with previous work on the effects of MSI data reduction [36], these results demonstrate that the removal of matrix-related signals improves post-processing, especially when using linear algorithms such as the widely used PCA.

Performance comparison to blank subtraction

In order to quantify the improvement of rMSIcleanup over previous alternatives, we compared its performance to the widely used “blank subtraction”. Datasets 9 and 10 were used to perform such a comparison. In both cases, a Region Of Interest (ROI) outside of the tissue region was defined. The basic principle of “blank subtraction” relies on discarding signals found in these off-sample regions under the assumption that they are matrix-related. Additional file 1: Figure S16 shows the two ROIS defined. In the case of Dataset 9, an ROI close to the tissue with apparent signs of metabolite delocalization was selected. The ROI for Dataset 10, on the other hand, was selected in a region that was far enough from the tissue and clean. Additional file 1: Figure S17 compares the mean spectra of these ROIS to the mean spectrum of their respective datasets. It can be appreciated that overall the intensities are noticeably lower in the off-sample ROIs, especially in the clean ROI defined for Dataset 10. In this case, due to the thin layers of silver used in Ag-LDI, the spectrum is orders of magnitude less intense.

Three different metrics were evaluated to determine the presence of a peak in the off-sample ROI and thus label it as Ag-related by the “blank subtraction” algorithm. The three standard metrics used were: intensity fold-change between in and off-sample, off-sample intensity and the off-sample SNR. Additional file 1: Figure S18 shows the precision vs recall curves obtained using “blank subtraction” based on the three metrics. The highest AUC of 0.61 reported for Dataset 10 using intensity as the classification metric was well below the reported AUC of 0.97 for rMSIcleanup.

As an example, analysed blank subtraction with a threshold of 10% of the maximum intensity (the top-performing metric). This resulted in 9 signals correctly classified as Ag-related (true positives TP), 141 correctly classified as not Ag-related (true negatives TN), 5 misclassified not Ag-related signals (false positives FP) and 19 misclassified Ag-related signals (false negatives FN). These results are associated with a poor false discovery rate (FP/(FP + TP)) of 35.7% and false omission rate (FN/(FN + TN)) of 11.88%. The blank subtraction method misclassified as matrix-related three metabolites with potentially relevant biological information: choline (C5H14NO; as \(\left[ {M + H - H_{2} O} \right]^{ + }\)m/z 86.09); cholesterol (C27H46O; as \(\left[ {M + ^{107} Ag_{ } } \right]^{ + }\) and \(\left[ {M + ^{109} Ag_{ } } \right]^{ + }\)m/z 493.24 and m/z 495.24 respectively) and an unidentified compound (\(\left[ {M + ^{107} Ag_{ } } \right]^{ + }\) and \(\left[ {M + ^{109} Ag_{ } } \right]^{ + }\), m/z 538.49 and m/z 540.49 respectively). On the other hand, several \(\left[ {Ag_{n} } \right]^{ + }\) clusters (from n = 4 to n = 9) were overlooked by the blank subtraction method but were properly classified by rMSIcleanup.

Discussion and conclusion

The goal of this study was to develop, optimize and validate a new algorithm to annotate signals attributed to the LDI promoting material in MSI. The developed algorithm is packaged and released as rMSIcleanup, an open-source R package freely available for the scientific community and fully integrated with rMSIproc [20], a stand-alone package for the visualization, pre-processing and analysis of MSI datasets.

As demonstrated, the widely used “blank subtraction” approach is outperformed by rMSIcleanup in the annotation Ag-related signals. In comparison to the top-performing alternatives for matrix-related peak annotation which are based on machine and deep learning [11], rMSIcleanup has the main advantage of using two intuitive scores (accounting for the isotopic ratios of clusters and the spatial distribution of their ions) and providing a visual justification of each annotation. This is a key contribution as it helps overcome the black-box problem, increases the user’s confidence in the annotation and can help researchers optimize experimental workflows (for instance, choosing LDI promoters that minimize interferences in the m/z range of interest). Another merit of our work is that, to our knowledge, it is the first matrix signal annotation algorithm to explicitly detect and deal with overlapping MS signals, which successfully prevents overlapped peaks from being misclassified. Given that we follow a targeted analytical approach, our classification is focused only on matrix-related signals while the algorithms presented by Ovchinnikova et al. [11] have a broader scope and also classify as off-sample other exogenous compounds. In the era of big data, these two apparently opposite approaches (namely our analytical approach based on chemical similarity scores and their untargeted approach based on machine learning) must not only coexist but also complement each other following the trend already initiated in other fields [37]. This reality urges the MSI community to develop annotation algorithms capable of, not only exploiting the knowledge in the increasingly large amounts of MSI datasets available, but also incorporating metrics that take into account the chemical context of the sample to aid transparent justification.

AgLDI MSI was chosen to validate the algorithm, due to the well-understood ionization of silver. A “validation list” was compiled from the literature, which included silver clusters that should be present in all samples and silver adducts or compounds that should not be present in any of them. Given the heterogeneity of the samples used in this study, the described validation list was adapted to each dataset. For each dataset, those clusters in the validation list for which the experimental data contained none of their theoretical masses were excluded. These adjustments in the validation list prevented an overestimation of the performance of the algorithm attributed to a high number of correctly classified “negative class” clusters (i.e. true negatives) located in mass ranges with no signal. We propose this validation strategy as a novel alternative to more common validation approaches such as chemical standards [6] or expert annotation [11, 35]. This study adds to previous work [6, 14, 17, 29, 38] and further demonstrates the potentiality of AgLDI MS imaging, a thriving technology known for its reduced background signals in spatial metabolomics that is strongly complemented by our annotation algorithm as it further removes the influence of the matrix.

In agreement with previous work on the effects of MSI data reduction [36], we have demonstrated that the annotation and removal of signals related to the LDI promoting material used can further enhance post-processing, due to the elimination of variables attributed to exogenous compounds that do not reflect the morphology nor chemical composition of the sample. These results highlight the need to include software annotation tools such as rMSIcleanup in MSI workflows before exploring the datasets with classical data analysis techniques used in metabolomics. Here we would like to emphasize the need for a standardized quantitative metric to assess the quality of MSI images and we acknowledge the relevance of standardization initiatives such as the MALDISTAR project (http://www.maldistar.org).

We envision two main applications for rMSIcleanup. On the one hand, it can be used in a purely exploratory fashion to better understand ionization and adduct cluster formation in new matrices, tissues and applications. In this case, the user is advised to add a long list of potential adducts or neutral losses to assess their formation. The validation approach followed in this paper is a clear example of this exploratory application of rMSIcleanup. A second application is the automated peak annotation of well-known matrices and tissues. In this case, only the clusters that are known to be formed need to be given to the software. This curated selection increases the data-processing speed. The set of matrix-related annotated peaks can then be eliminated from the dataset prior to performing post-processing workflows such as multivariate statistical analysis. In any case, the choice of adducts and neutral losses to consider (or matrix adducts with endogenous compounds, e.g. fatty acids + Ag) is application dependent and is therefore left to the user. This list must be manually specified as an input parameter to rMSIcleanup.

Finally, the promising results obtained in the annotation of \(Ag_{n}^{ + }\)-related peaks in AgLDI MSI open the door to the extension of this methodology to more widely used matrices such as 2,5-dihydroxybenzoic acid (DHB), 1,5-diaminonaphthalene (DAN), and 9-aminoacridine (9AA) among others. These organic matrices pose greater challenges. Firstly, they lead to increased matrix background due to their greater fragmentation and adduct formation [39,40,41] and the higher quantities in which they are added [40]. Moreover, they present the problem of “hot spot” formation given their less homogeneous application process [42]. These issues highlight not only the benefits of AgLDI MSI but also that matrix-related peak annotation can benefit data post-processing even further in applications using organic matrices.