The global workflow aims to align and normalize the data prior to sequence decoding, as illustrated in Fig. 3. The challenges lie in (i) image registration, compensating for alignment shifts along the sequencing cycles and chromatic shift, (ii) signal detection and normalization, and (iii) signal decoding. The signal-to-noise ratio (SNR) in the images is limited by the trade off between exposure time during image acquisition and bleaching of the stains. The longer the exposure time, the higher the SNR, but at the same time there is an increased risk of bleaching signals in neighboring focal planes. In order to detect as many true signals as possible, we have decided to have a more inclusive approach for signal detection. Following signal decoding noise and true signals were discriminated using a quality measurement as described in Sect. 3.3.
3.1 2D Approach
For the 2D approach, we used our previously published method [3, 6], implemented in the TissueMaps workflow [7]. TissueMaps is built on free and open source tools, and the analysis workflow makes use of the CellProfiler software [8]. The 2D analysis started with the MIP of the image stack (reducing the z-dimension to one). Following the MIP, for each cycle (t), each image channel (\(I_{tc}\), c representing either the general stain or one of the four letters of the genetic code) was first enhanced by a top-hat transformation (\(F_{tc}\)) with a structuring element (B) consisting of a disc with radius 10 pixels:
$$\begin{aligned} F_{tc}= (I_{tc} - I_{tc} \circ B), \end{aligned}$$
(1)
where \(\circ \) is a morphological opening. Individual signals were then defined by a labeled mask (L) in the general stain channel (D) of the first sequencing cycle (\(t=1\)), by a fixed intensity threshold, low enough to detect the signals after the top-hat (here, equal to 0.5). Finally, clustered signals were separated by shape-based watershed segmentation, i.e., a watershed applied on the negated distance transform, since signals are relatively circular [9]. Filtered images (\(F_{tc}\)) from the same sequencing cycle were thereafter registered (\(R_{tc}= registered(F_{tc}\))) towards the general stain using a rigid-body transformation (preserving the distance between every pair of points), from the “MultiStackReg” plugin for Fiji [10]. We applied the final mask representing the signals (L) on \(R_{tc}\), so that \(L_s\) is the set of pixels representing signal s in L. Finally, the intensity (\(S_{stc}\)) for each signal (s) in each channel (c) and time step (t) is defined in the 2D method as the maximum fluorescence intensity:
$$\begin{aligned} S_{stc}=\max _{p \in L_{s}}{(R_{tc})_p}\; \end{aligned}$$
(2)
We specifically extracted its (x, y) location as well as the intensity of this location in each of the five color channels (general stain and four letters of the genetic code), and five time steps in order to later decode and evaluate the signal as described in Sect. 3.3.
3.2 3D Approach
In the 3D approach, signals were separately detected in all color channels at all time steps using a local thresholding approach referred to as Per Object Ellipsefit (POE) [11]. The POE method computes local adaptive thresholds for each individual object (signal) where the threshold values are set to optimize the ellipse (ellipsoid in 3D) fit. This is done by creating a component tree [12] and traversing the pixels in order of decreasing intensities. Ellipsoid fit is defined by computing the moment matrix M for each object, extracting the axes from the eigenvalues of M, and computing the ratio between the actual object volume and an ideal ellipsoid with the dimensions given by these axes. The search for the best ellipsoid fit is done within given ranges for object volume (36–96 voxels), major and minor axis length (3–8 pixels), and value of the ellipsoid fit (\(\ge \)0.5).
Following signal detection, 3D spatial coordinates of detected signals were aligned and grouped. Within each time step (sequencing cycle) the color channels representing A, C, G, and T were affinely registered to the general stain of that same time step, using Iterative Closest Point (ICP) [13], followed by a spline based ICP version [14, 15], with a grid of 6 \(\times \) 6 \(\times \) 5 control points, that further corrects any chromatic aberration. Once the channels were aligned within each cycle, the general stain of each cycle was aligned with the general stain of the first cycle, used as a reference, utilizing rigid ICP registration. The associated channels of the time step were aligned using the same transformation.
Due to digitization effects and noise, slight shifts in the detected signals, for different color channels and time steps, remain also after the registration. Detected signals closer to each other than 3.4 pixels were merged together as one spot. As (x, y) location of a merged signal s we use the centroid of the corresponding cluster of (registered) signals. The intensity values of all merged signals were extracted from the smoothed (gaussian filter with \(\sigma =0.5\)) and dilated (ball-shaped structural element with five pixels diameter) original images, utilizing the inverse of the respective registration transformations.
Intensity measures were normalized separately for each channel (c) and time (t), such that signals with a brightness equal to the mode of the respective image volume gets the value zero, and the mean detected signal intensity is mapped to the value one:
$$\begin{aligned} S_{stc}=\frac{R_{stc}-mode(R_{tc})}{\frac{1}{N_s}(\sum _{l=1}^{N_{s}} R_{ltc})-mode(R_{tc})}, \end{aligned}$$
(3)
where \(S_{stc}\) is the intensity of signal s in channel c and time t for the 3D method, and \(N_s\) is the total number of detected signals.
Due to the inclusive intensity threshold used for the signal detection, artifacts from random background noise may have been detected as well. After normalization, a quality check based on the general stain was applied to reduce such noise. We require that the general stain channel (D), for each cycle (c), presents each signal detected (s), so that the following condition holds for all cycles:
$$\begin{aligned} \frac{S_{stc\mid c=D}}{\max _{c \in \{A,C,G,T\}}{S_{stc}}} \ge 0.1\; \end{aligned}$$
(4)
This step reduces the number of signals by approximately 2%.
3.3 Sequence Decoding and Quality Measurement
We measured the respective quality value for the 2D and 3D methods to evaluate the consistency of the signals detected. For each sequencing cycle, every location containing a signal is assigned the base, A, C, G or T, decided on the highest image intensity (following top-hat (2D) or normalization according to Eq. (3) (3D)). Autofluorescence may result in false signals that have a high intensity across all sequencing cycles, but always display the same color (that is, always appear in the same color channel). Such signals will appear as “homopolymers”, e.g. barcodes consisting of a single letter, such as ‘AAAAA’ or ‘GGGGG’. No such signals were included in the expected barcodes, and they are removed from our set of detected signals, reducing the number of signals by 0.6%.
To evaluate the signals detected, a quality \(Q_{st}\) of a signal s, in the cycle t was defined as:
$$\begin{aligned} Q_{st}=\frac{\max _{c \in \{A,C,G,T\}}{S_{stc}}}{\sum _{c \in \{A,C,G,T\} }{S_{stc}}}\; \end{aligned}$$
(5)
The quality score of the full sequence \(Q_{s}\) of signal s is further defined by the quality of its “weakest” cycle:
$$\begin{aligned} Q_{s}=\min _{t \in \{1,2,\ldots ,N_t\}}{Q_{st}} \; \end{aligned}$$
(6)
The quality score ranges from \(\frac{1}{N_t}\) (i.e., all signals equal) to 1 (all non-max signals equal to 0).
3.4 Crosstalk Compensation
Intensity values detected from each of the five sequencing cycles were crosstalk compensated in order to color-correct the intensities and determine the real dye concentration present in each signal. The sequencing cycles can not be assumed to be independent from each other, but the sequencing process and the image acquisition is affected by several kinds of cycle-dependent noise (e.g., focus, imperfect image registration, chromatic aberration, photobleaching, and other experimental conditions), meaning that the crosstalk between channels may vary cycle to cycle. Therefore a separate crosstalk compensation matrix for each of the sequencing cycles was estimated. Each crosstalk matrix \(X_t\) was estimated as in Sect. 2.2.6 of Li and Speed [16], inverted and multiplied by the matrix of the intensities of all signals s of cycle t, producing crosstalk compensated intensity values:
$$\begin{aligned} \begin{bmatrix} X_{stA} \\ X_{stC} \\ X_{stG} \\ X_{stT} \\ \end{bmatrix}=X_t^{-1} \begin{bmatrix} S_{stA} \\ S_{stC} \\ S_{stG} \\ S_{stT} \\ \end{bmatrix} \; \end{aligned}$$
(7)
We measured a new quality value for the methods by replacing the intensity \(S_{stc}\) in Eq. 5 by the compensated intensity value \(X_{stc}\).
3.5 Validation Approach
The only “ground truth” available for this type of image data is the a priori knowledge of the barcodes of the probes applied to the tissue section. In this particular experiment, 140 different probes were applied. The barcode length is five letters, meaning that our decoding approach may find 4\(^5\) = 1024 different codes, but only 140 out of these codes are to be expected (TP), and it can be assumed that any other code found is noise due to poor signal detection/decoding and is considered as unexpected (FP). There are of course also other sources of error, such as actual errors in the probes, but these will affect the 2D and 3D approach equally. Using the quality measure described in Sect. 3.3 an acceptance threshold can be set to balance the signal count vs. the signal precision (TP/(TP+FP)).