Skip to main content

Advertisement

Log in

Automated multidimensional single molecule fluorescence microscopy feature detection and tracking

  • Original Paper
  • Published:
European Biophysics Journal Aims and scope Submit manuscript

Abstract

Characterisation of multi-protein interactions in cellular networks can be achieved by optical microscopy using multidimensional single molecule fluorescence imaging. Proteins of different species, individually labelled with a single fluorophore, can be imaged as isolated spots (features) of different colour light in different channels, and their diffusive behaviour in cells directly measured through time. Challenges in data analysis have, however, thus far hindered its application in biology. A set of methods for the automated analysis of multidimensional single molecule microscopy data from cells is presented, incorporating Bayesian segmentation-based feature detection, image registration and particle tracking. Single molecules of different colours can be simultaneously detected in noisy, high background data with an arbitrary number of channels, acquired simultaneously or time-multiplexed, and then tracked through time. The resulting traces can be further analysed, for example to detect intensity steps, count discrete intensity levels, measure fluorescence resonance energy transfer (FRET) or changes in polarisation. Examples are shown illustrating the use of the algorithms in investigations of the epidermal growth factor receptor (EGFR) signalling network, a key target for cancer therapeutics, and with simulated data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

Download references

Acknowledgments

The authors gratefully acknowledge the support of the Biotechnology and Biological Sciences Research Council through grant BB/G006911/1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marisa L. Martin-Fernandez.

Appendices

Appendix 1: Feature detection

Each feature is assumed to have the same shape, or profile, t(xy), which is normalised so that \(\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} t(x, y) \hbox{d}x \hbox{d}y=1.\) For sources of negligible size t(xy) will correspond to the point spread function of the microscope at the detector. The signal (excluding noise contributions) in the image at position (xy) due to a feature at position (x = Xy = Y) with total integrated intensity I is then

$$ s(x, y)=It(x-X, y-Y). $$

Now consider a pixelised image with pixel centres at \(x=x_i (i=0, 1, 2,{\ldots},N_x-1)\) and \(y=y_j (j=0, 1, 2,{\ldots},N_y-1)\). The implementation described in this paper uses x i  = i + 0.5, y j  = j + 0.5, i.e. x and y are measured in units of pixel size with pixel centres at half-integer values. The signal in image pixel (ij) due to a feature at (XY) is then

$$ s_{ij}=I\int\limits_{x=i}^{i+1}\int\limits_{y=j}^{j+1}t(x-X, y-Y) \hbox{d}x \hbox{d}y=It_{ij}(X, Y), $$

where the profile has been integrated over the pixel area. s ij is a component of a vector s of size N x N y , which represents signal in all pixels. Similarly vector t(XY) has components t ij , and corresponds to the vector t in Hobson et al. (2009), except that in Hobson et al. (2009) t is normalised to unit peak. This simply means that for the definition here the amplitude A becomes the intensity I. Note that this integration of the profile over each pixel area correctly models the pixelation of the image, avoiding the “pixelation noise” effect discussed in Thompson et al. (2002).

If t(xy) is a Gaussian profile with standard deviation σ t

$$ t(x-X, y-Y)=\frac{1}{2\pi\sigma_t^2}\exp\left(-\frac{(x-X)^2+(y-Y)^2)}{2\sigma_t^2}\right) $$

and

$$ \begin{aligned} t_{ij}(X, Y) & = \frac{1}{4} \left(\hbox{erf}{\left({\frac{i+1-X}{\sigma_t\sqrt{2}}}\right)} -\hbox{erf}{\left({\frac{i-X}{\sigma_t\sqrt{2}}}\right)}\right)\\ & \quad \times \left(\hbox{erf}{\left({\frac{j+1-Y}{\sigma_t\sqrt{2}}}\right)} -\hbox{erf}{\left({\frac{j-Y}{\sigma_t\sqrt{2}}}\right)}\right). \end{aligned} $$

Bayes’s theorem relates the posterior probability, Pr(M|dK) that a particular hypothesis (or model), M, is true given some data, d, and background knowledge, K, to the probability, Pr(d|MK), that the data would have been measured if the hypothesis were true, given the same background information. Pr(d|Mk) is called the likelihood, and is generally a lot easier to specify directly than the posterior. Bayes’s theorem states that

$$ {\hbox{Pr}}(M|d,K)=\frac{{\hbox{Pr}}(d|M,K){\hbox{Pr}}(M|K)}{{\hbox{Pr}}({d|K})} $$

The hypothesis M may represent a particular model proposed to explain a system, or a set of parameters for such a model, the data set d some observations or measurements of the system, and the background knowledge K is any information about the system known prior to making the observations in question, for example known constraints on particular properties of the system. Pr(M|K) is the prior probability for the hypothesis, encoding the assumed probability of the hypothesis based on background information/bias only, before making the observations. If M represents a set of parameters, then the prior probability would be set to 0 for parameter values known to be impossible. Pr(d|K) is commonly called the evidence. For parameter estimation, where M represents a set of parameters of an assumed model, and the most probable range of parameter values consistent with the data is sought, the evidence is just a normalising factor and can be ignored. In the case of model selection, however, the evidence is crucial.

We wish to compare the posterior probability of hypotheses H 1 and H 0 at each point in the image. The ratio of the posterior probabilities, ρ, is given by

$$ \rho=\frac{{\rm Pr}(H_1|{\bf d})}{{\hbox{Pr}}(H_0|{\bf d})} $$

where d is the image data (Hobson et al. 2009). From Bayes’s theorem this is

$$ \rho=\frac{{\hbox{Pr}}({\bf d}|H_1){\hbox{Pr}}(H_1)}{{\hbox{Pr}}({\bf d}|H_0){\hbox{Pr}}(H_0)}, $$

where Pr(H 1) and Pr(H 0) are the prior probabilities of the two hypotheses, and Pr(d|H 1) and Pr(d|H 0) are evidences for model H 1 and H 0 respectively. The evidences are calculated from the likelihoods and prior probabilities for all the possible parameters for each model,

$$ {\hbox{Pr}}({\bf d}|H_1)=\int\int{\hbox{Pr}}({\bf d}|I,B_1,H_1) {\hbox{Pr}}(I,B_1|H_1) \hbox{d}I \hbox{d}B_1 $$

and

$$ {\hbox{Pr}}({\bf d}|H_0)=\int{\hbox{Pr}}({\bf d}|B_0,H_0){\hbox{Pr}}(B_0|H_0) \hbox{d}B_0. $$

Writing \(\rho_0=\frac{{\hbox{Pr}}(H_1)}{{\hbox{Pr}}({H_0})}=\left\langle n \right\rangle\) (where \(\left\langle n \right\rangle\) is the expected number of features per pixel in the image; Hobson et al. (2009)) gives the evidence ratio

$$ R=\frac{\rho}{\rho_0}=\frac{{\hbox{Pr}}({\bf d}|H_1)}{{\hbox{Pr}}({\bf d}|H_0)}. $$

R is calculated for each pixel in the image with the feature profile for model H 1 centred at the pixel centre. Pixels in this evidence map with \(R>R_{\rm min} \, (R_{\rm min}=1/\left\langle n \right\rangle)\) can be identified as features, i.e. the locations of the fluorophores, since in these pixels model H 1 is more probable than model H 0. In practise the threshold on RR min, can be decreased/increased to increase/decrease the feature detection rate (and false discovery rate). Since the number of single point emitting sources expected is not necessarily easy to determine, and there is some freedom to choose priors on I and B (e.g. change I max and B max, see below), which will also change the evidence, such tuning of the threshold is generally going to be necessary.

Single evidence ratio peaks are identified by segmenting the evidence map above the threshold into local maximum regions using a watershed-like algorithm, as follows.

  1. 1.

    Find the pixel with the highest value in the evidence map above the threshold, R min, which has not already been identified as associated with a source. If there is no such pixel the segmentation is complete.

  2. 2.

    Identify this pixel as the location of a new source. This becomes the current pixel and its value of R is called R current.

  3. 3.

    Grow the set of pixels that are associated with this source by adding any of the eight pixels directly or diagonally adjacent to the current pixel that satisfy R min < R ≤ R current. For each such added pixel repeat this step using that pixel as the new current pixel and its R as R current, recursively continuing until no further pixels can be added.

  4. 4.

    Go to step 1.

This method will find all peaks above R min and will correctly identify multiple peaks within single regions of the the map above R min. For each such segmented region, the pixel with the maximum value of R is identified as the location of the source. See Fig. 10 for an example.

Fig. 10
figure 10

Feature detection. Three panels showing the image data, the evidence map and the segmented evidence map. The segmented map shows the identification of two features within one region of the map above R min. The pixel with the highest value of R in each segmented region is the corresponding source location (identified by arrows 1 and 2)

Constraints on the feature intensity and background are imposed by choosing appropriate prior probability distributions. We assume priors Pr(I,B 1|H 1) = π I (I B (B 1) and Pr(B 0|H 0) = π B (B 0) with π I (I) and π B (B) uniform for I and B in a finite range, i.e.

$$ \begin{aligned} \pi_I(I) & = \left\{ \begin{array}{ll} 1/I_{\rm max} & {\rm if} \ 0 \le I \le I_{\rm max} \\ 0 & \hbox{otherwise} \end{array} \right.\\ \pi_B(B) & = \left\{ \begin{array}{ll} 1/B_{\rm max} & {\rm if} \ 0 \le B \le B_{\rm max} \\ 0 & \hbox{otherwise} \end{array} \right. . \end{aligned} $$

For each detected feature location the most probable intensity and background level, \(\hat{I}\) and \(\hat{B}_1,\) can be determined by Bayesian parameter estimation (Hobson et al. 2009). Maps of \(\hat{I}\) can be efficiently calculated for each pixel at the same time as R, making estimates of the feature intensities available as soon as the feature locations have been identified.

The covariance matrix of P 1 = Pr(IB|dH 1), C = (− H)−1, can be used to determine the root mean square uncertainties in \(\hat{I}\) and \(\hat{B}_1, \sigma(\hat{I})=\sqrt{{\bf C}_{11}}\) and \(\sigma(\hat{B}_1)=\sqrt{{\bf C}_{22}},\) where H is the Hessian matrix,

$$ {\bf H}=\left( \begin{array}{cc} \frac{\partial^2\ln{P_1}}{\partial I^2} & \frac{\partial^2\ln{P_1}}{\partial I \partial B} \\ \frac{\partial^2\ln{P_1}}{\partial B \partial I} & \frac{\partial^2\ln{P_1}}{\partial B^2} \end{array} \right). $$

evaluated at \((\hat{I}, \hat{B}_1)\). Note that these uncertainties do not account for any error in the feature FHWM or position, and as such are likely to underestimate the error to some extent.

The feature detection was tested by simulating multiple data sets (as described in “Simulations” and “Appendix 3”), each with 100 frames of 100 constant intensity non-moving features of FWHM 3 pixels separated by at least 9 pixels. Data sets were simulated with feature intensities varying from 10 to 2,000 photons per frame (integrated over the profile), and with constant uniform background intensity 0, 1, 5 or 10 photons per pixel per frame. Poisson noise, a linear gain of 200 and readout noise of standard deviation 5 were added. Each simulation was performed using a Gaussian profile and also using an Airy disc profile. In the latter case, the Airy profile contribution to a particular pixel was determined by integrating an oversampled Airy profile over the pixel. The feature detection and measurement were performed on each simulated data set. The number of features correctly detected and the number of false detections (detections not corresponding to true features) were determined. Figure 11 shows the results as plots of the true-positive and false discovery rates as functions of photons in a feature and signal-to-noise at the peak of a feature. For a signal-to-noise in the peak pixel above about 1 the detection rate is better than 98 percent and the false discovery rate less than a few percent. There is little difference in these figures whether the model profile uses and Airy disc or a Gaussian profile, confirming that the Gaussian model in the analysis is justified for typical single molecule data. For the simulations with zero background emission but low signal-to-noise, the false discovery rate is a little higher with an Airy disc than with a Gaussian. This appears to be occasional false detections of features from emission in the Airy rings due to the approximate image noise estimation and Gaussian profile assumption. However this should not affect single molecule data from cells in which there is always some background intensity. There is no difference in the true positive rate (detection rate) between Airy and Gaussian profile simulations. This will be revisited when the full physical noise model has been implemented to see if it affects this observation.

Fig. 11
figure 11

Feature detection metrics measured from various simulated data sets (see text). The true-positive rate and false discovery rate as functions of the signal-to-noise at the peak of a feature (upper panel) and number of photons in a feature (lower panel). The true-positive rate, TPR, is the fraction of true features that were detected, i.e. TPR = number of true features detected/total number of true features. The false discovery rate, FDR, is the fraction of feature detections that do not correspond to a true feature, FPR = number of false positives/total number of detections (true and false). The signal-to-noise, S/N, is defined here as S/N = model intensity at peak of feature/SD of intensity at peak

Appendix 2: Registration transformation

The function to transform a position \({\bf r}=\left( \begin{array}{c} x \\ y \end{array} \right)\) in the non-reference channel to the corresponding point \({\bf r}_{\rm ref}=\left( \begin{array}{c} x_{\rm ref} \\ y_{\rm ref} \end{array} \right)\) in the reference channel is

$$ \begin{aligned} {\bf r}_{\rm ref} & = {\bf c}_{\rm ref}+{\bf Rot}(-\theta){\bf Rot}(\theta_S){\bf S} \,{\bf Rot}(-\theta_S)\left({\bf r}-{\bf c}+{\bf D}\right) \\ & = {\bf A}({\bf r}, {\bf T}). \end{aligned} $$

c and c ref are the midpoints of the channel image region for the non-reference and reference channel respectively, and are adopted as sensible origins for the transformations. Rot(ϕ) is the matrix \(\left( \begin{array}{cc} \cos\phi & -\sin\phi\\ \sin\phi & \cos\phi \end{array} \right)\), which rotates a vector clockwise through angle ϕ. \({\bf D}=\left( \begin{array}{c} D_x \\ D_y \end{array} \right)\) is the translation to be applied, \({\bf S}=\left(\begin{array}{cc} S_1 & 0\\ 0 & S_2 \end{array} \right)\) specifies the orthogonal scaling factors, S 1 and S 2. θ S is the angle specifying the orientation of the scaling axes and θ is the angle of the rotation. The parameters to be determined are T = [D x D y S 1S 2, θ, θ S ].

The Scott and Longuet-Higgins algorithm (Scott and Longuet-Higgins 1991) is used to match beads in each non-reference channel with their counterparts in the reference channel. If the positions of the N bead,ref beads identified in the reference channel are \({\bf r}_{\rm bead,ref,i}\, (i=1{\ldots}N_{\rm bead,ref})\) and in the non-reference channel \({\bf r}_{{\rm bead},j}\,(j=1{\ldots}N_{\rm bead})\), a proximity matrix, G is calculated with elements \(G_{ij}=\exp\left(-\frac{\left|{\bf r}_{{\rm bead,ref},i} -{\bf r}_{{\rm bead},j}\right|^2}{\sigma^2_{{\rm slh},ij}}\right)\) where σslh controls the scale on which features in different channels may be associated and influences the effectiveness of the algorithm. A value σslh = 10 pixels was used. The algorithm exploits the properties of singular value decomposition of G to produce a modified proximity matrix that can be used to identify which bead in the reference channel corresponds to which in the non-reference channel. For the small rotations (typically <1°) and scalings (typically <4%) found, and provided the translations are not too large (approximately <10 pixels) this algorithm works very well. In some cases where the offset, D, gets larger than about 10 pixels, the algorithm fails. Such cases are easily overcome by trying offsets to r bead, j \({\bf r}_{off}=\left(\begin{array}{c} p \times 10 \hbox{pix} \\ q \times 10 \hbox{pix} \end{array} \right)\) with integers p and q both in the range \(-3{\ldots}3,\) before calculating the proximity matrix.

With corresponding beads in each channel identified, determination of the transformation parameters simply requires finding the transformations for which non-reference channel beads are best aligned with their reference channel counterparts. It is not necessary to consider all possible combinations of beads in one channel with beads in the other. If the corresponding pairs of the N matched matched beads have positions r bead k and \({\bf r}_{{\rm bead,ref},k} \, (k=1{\ldots}N_{\rm matched})\), define

$$ \Updelta r_{{\rm bead},k}({\bf T})=\left|{\bf A}({\bf r}_{{\rm bead},k}, {\bf T})-{\bf r}_{{\rm bead,ref},k}\right|, $$

the difference between the reference bead position and the position of the non-reference bead registered with the transformation T. One approach to determining the optimal transformation parameters would be find the T that minimises the sum of the squared \(\Updelta r_{{\rm bead},k}({\bf T})\)s, i.e. minimise \(\sum_{k=1}^{N_{\rm matched}} \Updelta r_{{\rm bead},k}({\bf T})^2,\) however this method would allow any mismatched beads to influence the determined transformation. Although the Scott and Longuet-Higgins method does a good job of matching beads, there are often a small number of mismatched beads (Fig. 1c). An alternative objective function, P(T), is therefore used, with

$$ P({\bf T})=\sum_{k=1}^{N_{\rm matched}} \exp\left(-\frac{\Updelta r_{{\rm bead},k}({\bf T})^2}{2\sigma_P^2}\right), $$

which must be maximised, where σ P controls the sensitivity to mismatch. In this objective function any beads that are significantly mismatched compared to σ P will produce small local peaks in P(T) away from the main maximum, so as long as the optimiser finds the main maximum these mismatched beads will not influence the final parameters. Two passes of optimisation are performed, first with σ P  = 10 pixels, which broadens the maximum ensuring the global maximum is identified approximately, then a second pass beginning at the maximum of P(T) from the first pass but with σ P set to the standard deviation of the feature profiles, σ t . The second pass thus homes in on the peak more precisely, with P(T) approximating a correlation coefficient of the profile of each bead with its counterpart. Optimisation of P(T) is performed using the Downhill Simplex method (Press et al. 1992).

To assess the quality of the determined transformation solution, three scores are calculated. Bead positions are registered, and those that are within σ t /2 of their corresponding reference channel bead are identified as matched. The scores are

  • f matched, the ratio of matched features to maximum possible matches, where the number of maximum possible matches is the sum of min(N beadN bead,ref) over images of beads,

  • \(\Updelta r_{69}\) and \(\Updelta r_{95}, \) the largest separation of the best matched 69%/95% of the matched beads.

f matched = 0.7 is used as a threshold below which registration is assumed to have failed. \(\Updelta r_{69}\) and \(\Updelta r_{95}\) are useful diagnostics giving an idea of the typical agreement of registered features, but they will be a combination of the localisation uncertainty in the measurement of the beads and the error in the registration. As such they can be used as estimates of the upper limit of the registration error, where \(\Updelta r_{95}\) is the more conservative estimate.

Appendix 3: Simulations

This appendix details the model used for creating simulated data sets that may then be used to test the single molecule analysis methods.

  • Registration transformations for each non-reference channel (as described in the “Registration transformation” section) are generated. The parameters [D x D y S 1S 2, θ, θ S ] for each channel are sampled from probability distributions that approximate the typical range of values seen in real data sets.

  • A number N tracks of tracks is modelled.

    • Each track consists of a group of N fluor co-located fluorophores moving together and individually fluorescing/blinking/bleaching.

    • N fluor is sampled from a Poisson distribution of specified mean (currently 2) and N fluor = 0 is avoided.

    • Each fluorophore is either off (not yet fluorescing), on (fluorescing), blinking or photo-bleached.

    • Fluorophores are individually switched on, may then blink, recover and photo-bleach all at times determined randomly according to rates specified for each event.

    • Each track has an initial position, speed and direction of motion chosen at random, the speed chosen from a specified normal distribution. At each time step the direction and speed are modified by applying changes sampled from normal distributions with specified standard deviations.

    • At each time point the track intensity is ev fac I fluor times the number of fluorescing fluorophores, where I fluor is the individual fluorophore intensity and \(ev_{\rm fac}=\exp{-d/d_{\rm ev}}\). d is the depth of the fluorophores in the evanescent field, and is constant and randomly chosen for each track between 0 and twice the evanescent field depth, d ev.

    • At present the intensity of fluorophores is the same in each channel. Models for different types of experiment will be implemented to improve on this, e.g. FRET and co-localisation experiments.

  • Single molecule images are then modelled at each time point, considering each fluorophore group as a Gaussian spot of common full width half maximum (FWHM). Real images contain background structures on a variety of spatial scales, for example auto-fluorescence, which will challenge feature detection algorithms. To simulate background emission with spatial structure on a range of spatial scales, a single image consisting of Perlin noise is generated and used as the background signal in all channels at every time point. This Perlin noise image consists of random spatial fluctuations in intensity, on spatial scales L ranging from the model feature profile FWHM to the image size, with amplitudes proportional to 1/L. This is not necessarily an accurate reflection of the background structure, but is a simple parametrised way to add challenging background emission that can be improved upon when the background properties are better characterised. The intensity incident on each pixel is then the contribution from fluorophores and the background structure, from which the number of photons in the pixel is Poisson sampled. The quantum efficiency of the detector is modelled by sampling from a binomial distribution (which, convolved with the initial Poisson distribution, gives another Poisson) and a linear gain is then applied followed by the addition of Gaussian noise and an offset to model the bias and readout noise. This is a simplified model for a CCD response. Corresponding images are produced for each channel, but with the background image, fluorophore positions and FWHMs transformed according to the registration vectors before sampling onto the model CCD/pixel grid.

  • Bead images are also simulated, where bead samples are modelled as a random spatially uniform distribution of Gaussian point sources, sampled on the model detector using the same detector model as above.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rolfe, D.J., McLachlan, C.I., Hirsch, M. et al. Automated multidimensional single molecule fluorescence microscopy feature detection and tracking. Eur Biophys J 40, 1167–1186 (2011). https://doi.org/10.1007/s00249-011-0747-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00249-011-0747-7

Keywords

Navigation