1 Introduction

Ideally, a microscope discerns and maps all features in the specimen. Until the dawn of the 21st century, it was generally accepted that an optical microscope using lenses would not be able to discern features on the nanoscale [1]. By the mid 1990s, however, physical concepts for overcoming the longstanding diffraction barrier had been introduced, and the development of super-resolved far-field fluorescence microscopy (or ‘nanoscopy’) has since progressed tremendously. At a deep level, the reason for the vastly increased resolution capabilities of modern fluorescence nanoscopes—to resolve molecules with distances of even just a few nanometers—has been a major paradigm shift: the discrimination of molecules is no longer realized by the focusing of the light in use. Rather, the molecules are transiently transferred to different states, usually fluorescence ‘on’ and fluorescence ‘off’ states, so that they are distinguishable when using a (diffraction-limited) illumination pattern to probe their signals [2].

An important aspect for mapping the molecules is that the transient state change can occur in a spatially controlled (i.e., coordinate-targeted) or in a spatially stochastic manner. The first kind is realized in methods called stimulated emission depletion (STED) [3, 4], saturated structured-illumination microscopy (SSIM) [5] and reversible saturable/switchable optical fluorescence transitions (RESOLFT) [6, 7]. In these approaches, a pattern of light with one or multiple intensity minima switches the molecules optically between an ‘on’ and an ‘off’ state, thus transferring all molecules to one of these states except those located at or near the intensity minima. Scanning the pattern of light across the specimen ensures that every molecule ends up in a subdiffraction-sized region at least once, and hence is for that time in a different state from its resolved neighbors.

The highest combined resolution along all three spatial dimensions has been achieved for STED or RESOLFT microscopes with two opposing lenses, where the illumination as well as the detected fluorescence light from both lenses can be combined in a coherent manner [8,9,10].

Spatially stochastic methods, among them photoactivated localization microscopy (PALM) [11] and stochastic optical reconstruction microscopy (STORM) [12] bring molecules in close proximity independently to an on-state in which the individual molecules can emit multiple and ideally a large number of fluorescence photons. The collected signal is then used to estimate the position of the isolated molecules with subdiffraction precision. Again, the use of two opposing lenses coherently combining the fluorescence light has delivered very high and close-to-isotropic 3D resolution [13].

A recent breakthrough development, MINFLUX, has attained the ultimate molecule-size, one-nanometer resolution scale at minimal fluxes of emitted fluorescence photons [14]. MINFLUX operates with spatially stochastic single-molecule switching, but makes use of one or more coordinate-giving intensity minima of excitation light to make the controlled, known position of a minimum coincide with the molecule position and determine it very efficiently in terms of registered fluorescence photons.

Although the spatially stochastic methods can provide molecular maps [15], counting molecules with stochastic methods is not as straightforward as it may appear. Molecules which do not emit sufficient numbers of photons while residing in the on-state to be detected, or which do not assume this state at all, are missed out completely. Other molecules might occupy the on-state repeatedly, and thus might be counted multiple times, thus requiring a careful and non-trivial calibration. For these reasons, fluorophores which assume the on-state only once would be favorable in principle, but such fluorophores would allow only a single super-resolution recording, meaning that the molecular counting would not be repeatable. As an additional aspect to consider, counting molecules one by one requires an extended recording time in which the molecules must remain stationary.

In contrast, STED and RESOLFT microscopy are not based on single-molecule detection and, by registering signals from all molecules from a given coordinate simultaneously, they provide a potential speed advantage. However, this very same fact has made the counting of molecules in ensemble-based nanoscopy imaging modes challenging. In most cases, the actual brightness of individual molecules in an experiment remained elusive, because the sample did not contain spatially sparse molecules found on their own. Furthermore, local environment heterogeneity can induce local variations of the molecular brightness. However, if the average contribution of a single molecule to the recorded image can be reliably estimated, the number of participating molecules can simply be deduced from the magnitude of the fluorescence signal.

A reliable method to extract the numbers of molecules in STED or RESOLFT microscopy is very desirable, and substantial progress has been achieved towards this goal. Indeed, a careful analysis of the photon arrival statistics in STED and RESOLFT imaging, especially the study of (1) occurrences of simultaneous arrivals of fluorescence photons in STED as well as the (2) fluctuations in signal of repeated recordings at the same scan position in RESOLFT, reveals higher-order dependencies of the recorded photon statistics on the number of molecules and their brightness. Such a careful analysis allows to disentangle number and brightness and thus map the number of molecules in an image. The effects and statistical signatures harnessed are fully compatible with the subdiffraction resolution of STED and RESOLFT and can therefore readily be applied also in a live-cell imaging regime.

In the following sections, these two relatively new quantitative nanoscopy methods will be presented, with an emphasis on the statistical modeling that goes beyond a purely ‘classical’ shot-noise description of photon statistics.

1.1 Molecular Contribution Function (MCF)

In analogy to the point spread function (PSF), which corresponds to the image of a theoretical point source, the molecular contribution function (MCF) can be defined as the quantitative spatial distribution of the average number of photons counted from a single fluorophore. With the knowledge of the MCF, all linear imaging systems where the recorded image is the sum of the contributions of all the individual molecules, counting molecules then becomes a conceptually straightforward and accessible task of normalizing the image by the MCF. For example, a space-invariant, linear imaging system expresses the measured image \(Y(\mathbf {x})\) at each scan position \(\mathbf {x}\) as the convolution of the number density \(n(\mathbf {x})\) with the \(\text {MCF}(\mathbf {x})\)

$$\begin{aligned} Y(\mathbf {x}) = \left( n * \text {MCF}\right) (\mathbf {x}) + \varepsilon (\mathbf {x}), \end{aligned}$$
(7.1)

where \(*\) denotes the convolution operator and \(\varepsilon (\mathbf {x})\) describes the measurement noise, i.e., the deviation of a particular recorded image from the average (noise-free) image.

Estimation of the unknown number density \(n(\mathbf {x})\) from the data then amounts to simply deconvolving the data with a carefully calibrated MCF.

Counting the number of molecules in defined isolated regions, which in principle could be as small as the resolution scale, means a division of the summed image data in those regions by the average total signal of a single molecule, i.e., the integral over the MCF.

Even though a considerable part of the theory in this chapter is presented using continuous functions mostly for the sake of simplicity of notation, it is understood that all recorded microscopic data is pixelated with a pixel size smaller than the spatial resolution. The transition between continuous and discrete data grids is straightforward and may be realized implicitly wherever it is convenient.

Note that every fluorescence microscopy technique with linear imaging conditions and independent and identically behaving fluorophores features an MCF, which lends itself to mapping the number of fluorophores. However, the MCF and the total signal of a single molecule will depend on the used optics, measurement properties and chosen fluorophores. The main task of quantitative STED/RESOLFT nanoscopy therefore lies in determining the average signal per fluorophore intrinsically from the dataset itself.

2 STED Nanoscopy with Coincidence Photon Detection

Measurements of the statistics of simultaneous photon arrivals in fluorescence microscopy have been shown to identify individual fluorophores [16, 17], to improve the resolution of fluorescence microscopy [18, 19] and have been used to analyze individual clusters of molecules distributed in space [20, 21]. In the next section, a full imaging model of simultaneous photon arrivals in confocal and STED microscopy is derived, and afterwards a live-cell imaging experiment will be described.

In a scanning fluorescence microscope with a pulsed illumination light source (the preferred approach for photon coincidence measurements), the specimen at each recorded pixel position experiences a certain number of light pulses. If excitation of the molecules is accomplished with pulses with a duration much shorter than the fluorescence lifetime then each molecule can be excited at most once per pulse. The probability of a molecule to be excited and to yield a detected photon as the focal center of the scanning and the coordinate of the molecule coincide is named the molecular brightness \(\lambda \). Reflecting changes in local environment, it may change with the location of a molecule, but here it is assumed that all molecules at a given location (within a resolved region) also share the same brightness and that \(\lambda (\mathbf {x})\) remains constant over time. The optimal case of \(\lambda =1\) would be reached if the molecule contributed with a detected photon for every pulse. In practice, strong excitation leads to increased photo-bleaching and a widening of the spatial region in which fluorophores are in the saturated regime (broadening of the excitation spot), which must be avoided. The quantum yield of the fluorophores is limited and the detection optics cannot collect and detect every emitted photon, resulting in a molecular brightness considerably below one (typically on the order of 0.01). For molecules not at the center of the scanned focal spot, the effective molecular brightness has to be additionally scaled by the PSF (h). These aspects are contained in the MCF. The goal is to measure the statistics of the numbers of detected photons from the sample after each light pulse.

In a suitable experimental arrangement [22], the detected photons from all molecules at a scan coordinate are distributed by an array of beam splitters randomly onto four equally sensitive detectors in order to measure the numbers of detected photons for every pulse (see Fig. 7.3a). Detectors with at least one assigned photon are considered active and count as a detection event. The number of active detectors thus becomes the experimentally accessible value. A direct quantitative detection of the numbers of fluorescent photons with a time resolution of at least the duration between consecutive light pulses would simplify the statistical model as well as the experimental setup and would therefore be highly desirable, but at the time when these experiments were conducted an array of APD detectors capable of counting single photons with a dead time larger than the fluorescence lifetime was deemed an acceptable compromise of the ability to detect simultaneously arriving photons and experimental effort.

2.1 Statistical Model

The probability for a single molecule to emit more than one photon in the duration of the excitation pulse is negligible. Therefore the photon emission process can be well described by a multinomial random process with \(\lambda _i\) the molecular brightnesses of molecules \(i=1,..,N\) located at positions \(\mathbf {u}_i\). The probability \(\mathfrak {p}_i\) of a molecule to contribute with a photon to the detection at the current scanning position \(\mathbf {x}\) is \(\mathfrak {p}_i(\mathbf {x})=\lambda _i h(\mathbf {x} - \mathbf {u}_i)\) with the PSF of the system h.

Due to superposition and independence of the molecular markers, for each scan position the number of contributed photons follows a discrete probability distribution of a sum of independent Bernoulli trials with parameters \(\mathfrak {p}_i\). The probability that exactly k photons contribute during a single pulse is denoted by \(Q_k(\mathbf {x})\). The expressions for \(k=1,2\) become

$$\begin{aligned} Q_1(\mathbf {x})= & {} \sum _{i=1}^N \mathfrak {p}_i(\mathbf {x})\prod _{j\ne i} \left( 1-\mathfrak {p}_j(\mathbf {x})\right) \nonumber \\ Q_2(\mathbf {x})= & {} \sum _{i=1}^N \mathfrak {p}_i(\mathbf {x})\sum _{j>i}^N \mathfrak {p}_j \prod _{k\ne i,j} \left( 1-\mathfrak {p}_k(\mathbf {x})\right) \end{aligned}$$
(7.2)

expressing all possibilities of exactly one or two molecules contributing with a photon. The \(\mathfrak {p}_i(\mathbf {x})\) are much less than unity, and one can simplify these expressions further by neglecting terms with higher orders in \(\mathfrak {p}_i(\mathbf {x})\):

$$\begin{aligned} Q_1(\mathbf {x})\simeq & {} \sum _{i=1}^N \mathfrak {p}_i(\mathbf {x}) \nonumber \\ Q_2(\mathbf {x})\simeq & {} \frac{1}{2}\left[ \left( \sum _{i=1}^N \mathfrak {p}_i(\mathbf {x})\right) ^2-\sum _{i=1}^N \mathfrak {p}_i(\mathbf {x})^2\right] \end{aligned}$$
(7.3)

\(Q_2\) is approximately given by the probability to obtain two photons from any molecule minus the so-called ‘antibunching term’ \(\sum _{i=1}^N\mathfrak {p}_i(\mathbf {x})^2\), which accounts for the unphysical case that two photons would originate from the very same molecule.

2.1.1 Distribution on Active Detectors

Because the utilized detectors are not able to quantitatively detect the number of incident photons, information about the numbers of contributing photons is partially lost. This loss can be taken into account by geometrical factors. For example, \(\beta =(d-1)/d\) is the probability that two photons are registered on two different detectors for d available detectors. Let \(D_i(\mathbf {x})\) be the mean number of active detectors at each scan position. Neglecting higher order terms of \(Q_i\) gives

$$\begin{aligned} D_1(\mathbf {x})\simeq & {} Q_1(\mathbf {x}) \nonumber \\ D_2(\mathbf {x})\simeq & {} \beta Q_2(\mathbf {x}) \end{aligned}$$
(7.4)

Using \(d=4\) detectors results in a loss of about 25% of the two-photon incidence events compared to the ideal case of detecting all contributing photons. \(n(\mathbf {x})\) is denoted as the local fluorophore density and \(\lambda (\mathbf {x})\) as the molecular brightness (defined only where \(n(\mathbf {x})>0\)). The transition to a continuous grid can be easily performed by

$$\begin{aligned} \sum _{i=1}^N \mathfrak {p}_i(\mathbf {x})^j \rightarrow (p^j n) *h^j \nonumber \end{aligned}$$

with \(*\) the convolution operator. On a continuous grid, the mean number of active detectors per light pulse becomes

$$\begin{aligned} D_1^m(\mathbf {x})= & {} \left( (\lambda n)*h_m\right) (\mathbf {x}) \nonumber \\ D_2^m(\mathbf {x})= & {} \frac{\beta }{2}\left[ \left( (\lambda n)*h_m\right) ^2-(\lambda ^2n)*h_m^2\right] (\mathbf {x}) \end{aligned}$$
(7.5)

where m denotes the imaging mode (\(m=\{c,s\}\) for confocal or STED recordings), which affects the width of the PSF. If \((\lambda n)*h\) is not much smaller than one, for example if the number of simultaneously recorded molecules is large, additional, higher-order terms must be included. Expressions including a possible Poissonian background contribution and higher-order terms are given in [22] (see Supplementary Note). \(D_2^m\) includes the interesting ‘antibunching term’ \((\lambda ^2n)*h_m^2\), which depends quadratically on \(\lambda (\mathbf {x})\), thus making it a critical parameter.

2.1.2 Estimation of Molecule Density and Brightness Distribution

The average number of active detectors \(D_{1,2}^m\) can be estimated empirically by the measured occurrences of active detector events in an experiment \(Y_{1,2}^m(\mathbf {x})\) normalized by the number of repetitions t, i.e., the number of applied laser light pulses t per pixel. The desired molecule density and brightness distributions are ultimately extracted by fitting the data with the model in (7.5). This is accomplished by the fast proximal gradient algorithm FISTA [23], which minimizes the squared distance between the model and the experiment while also penalizing strong variations in \(\lambda (\mathbf {x})\). The estimated molecule density \(\hat{n}(\mathbf {x})\) and molecular brightness \(\hat{\lambda }(\mathbf {x})\) are the solution of the constraint optimization problem:

$$\begin{aligned}&\mathrm {argmin}_{n,\lambda } \sum _{m={c,s}}\sum _{i=1,2} \alpha _{im} \left\| D_i^m(n, \lambda ) - Y_i^m/t\right\| ^2 + \gamma \phi (\lambda ) \nonumber \\&n(\mathbf {x}) \ge 0, \lambda (\mathbf {x}) \ge 0 \end{aligned}$$
(7.6)

with \(\alpha _{ij}\) and \(\gamma \) positive weighting parameters and \(\phi \) a typical penalization term, the Laplacian of the brightness (\(\phi (\lambda )=\nabla ^2\lambda \)) in order to enforce smoothness in the brightness distribution.

Fig. 7.1
figure 1

Figure adapted from [22]

Simulation of the error of the estimation of number and brightness for a single cluster of molecules in the confocal mode. a, c Relative estimated numbers of molecules \(\hat{n}/n\) and b, d relative estimated brightness \(\hat{\lambda }/\lambda \). The number of molecules n varies from 1–50 with a brightness of \(\lambda =0.013\) (a, b). The brightness varies (0.005–0.035) while keeping the number of molecules constant \(n=10\) (b, d). The number of illumination pulses is \(2.5\cdot 10^4\) (a, b) and \(4\cdot 10^4\) (c, d). The FWHM of the PSF is 240 nm. Mean, quantiles (15, 85%) and analytically derived rel. standard deviations are shown.

In (7.6) data recorded in both the confocal and STED imaging mode of the same specimen is jointly optimized, which is advantageous in practical experiments. The confinement to only one of the imaging modes in the fit is straightforward. With the value of \(\gamma \) appropriately chosen, the penalization sufficiently stabilizes the solution of (7.6), preventing strong spatial oscillations in brightness on scales below the resolution. The weighting parameters \(\alpha _{im}\) are chosen such that all least-square residuals are approximately on the same scale (\(\alpha _{im}\simeq 1/\sum _{\mathbf {x}} Y_i^m(\mathbf {x})\)). In order to incorporate the non-negativity constraints, \(n(\mathbf {x})\) and \(\lambda (\mathbf {x})\) are substituted by squared variables \(m^2(\mathbf {x})\) and \(q^2(\mathbf {x})\) and (7.6) was solved for the \(m(\mathbf {x})\) and \(q(\mathbf {x})\) instead. As starting point for the numerical optimization, a deconvolved single active detector image was chosen, which provided an initial molecular density for a given reasonable choice of the average molecular brightness.

The main cause of deviation between model and measurement is shot noise. Interestingly, the relative standard deviations (RSTD) of the estimated number of molecules \(\hat{n}\) and brightness \(\hat{\lambda }\) at a given position with molecular signals (such as from a single cluster) can be derived analytically (see [22])

$$\begin{aligned} \text{ RSTD }(\hat{n}) = \text{ RSTD }(\hat{\lambda }) \propto \sqrt{(n-1)/(n\lambda ^2t)} \end{aligned}$$
(7.7)

and is confirmed by simulations (see Fig. 7.1). Calculations and simulations both suggest that the RSTD is rather independent of the actual number of molecules and can be below \({\sim }20\%\) for conditions provided by synthetic fluorophores (\(\lambda \sim 0.01\), \(t>10^3\), i.e. more than 1000 pulses) [24].

Please note that in the imaging model (mean number of active detectors) in (7.5) and possible generalizations, higher-order terms represent convolutions of the molecular density with higher orders of the PSF \(h_m^n\). In principle, these terms represent parts of the image with an increased resolution, however their contribution is rather low as they scale with higher orders of the molecular brightness. Measurements of these higher-order PSFs are displayed in Fig. 7.2.

Fig. 7.2
figure 2

Figure adapted from [22]

The PSFs of 1–4 active detector events in confocal and STED mode. DNA origami with up to 24 ATTO 647N in two lines of 41 nm spacing was sparsely immobilized and measured in confocal (a, overlay of 64 single DNA images) and STED (b, overlay of 114 single DNA images) mode. On the right: normalized line profiles crossing the center of the PSFs. All overlays allowing a fit with a 2D Gaussian peak function display resulting FWHM values. H is the maximum value (events). Scale bars: 100 nm.

2.2 Intrinsic Molecular Brightness Calibration

Double-stranded DNA (dsDNA) is easy to synthesize and label in a controlled manner. Labeling the dsDNA with up to four ATTO 647N fluorophores and immobilizing it sparsely on a glass surface allows controlled measurements as well as independent estimates of the number of fluorophores by observing individual bleaching steps. A schematic of a microscope with a fluorescence detection path split into four equivalent units is shown in Fig. 7.3a. For a measurement in confocal mode with \(t=3000\) excitation pulses per pixel, resulting in 6000–7500 collected photons per fluorophore in the resulting scanned image, the number of fluorophores can be estimated with \(\sim \)20% uncertainty and matches the number of observed bleaching steps (see Fig. 7.3). However, the advantages over observing abrupt changes in brightness by stepwise bleaching are that the molecules remain intact (except for the gradual bleaching during multiple scans) and that the observation of bleaching steps often suffers from uncertainty about the number of simultaneously bleached molecules.

Fig. 7.3
figure 3

Figure reproduced from [22]

Experimental setup and measurements on double-stranded DNA (dsDNA) labeled with up to four ATTO 647N. a Confocal/STED microscope equipped with four independent detection channels. BS 1:1 beam splitter, \(D_i\) i-th detector (i \(=\) 1–4). b Fluorescence bleaching steps of single dsDNA (corresponding spots are indicated in d by triangles of the same colors). c Comparison of the number of dye molecules (mean and s.d.) derived from photon incidence analysis with the detected number of bleaching steps of the same single dsDNA (red line: \(y = x\)). d, e Example image showing one- and two-photon detection events measured in the confocal mode. The dsDNA positions with only a single dye molecule are indicated by open triangles in (e). f Established map of the number of ATTO 647N molecules, derived from the data in (d) and (e). H is the maximum of the color scale, representing events in d and e and molecules in f. Scale bars: 1 \(\upmu \)m.

2.3 Counting Transferrin Receptors in HEK293 Cells

The bleaching rate of ATTO 647N in the experiments shown in the previous section (Figs. 7.2 and 7.3) is as low as \({\sim }3\%\) per full scan [22]. This allows to perform a STED recording right afterwards, so that the photon statistics of both recordings can be combined and the optimization algorithm can use both datasets for the solution of (7.6). The combined data yields the molecular numbers, mostly derived from the statistical information contained in the confocal recordings, with a spatial 2D resolution of \({\sim }50\) nm mostly originating from the STED recording. Using the ability of fluorescence microscopes to look into the interior of cells, transferrin receptor (TfR) molecules in human embryonic kidney 293 (HEK293) cells were counted. The TfR molecules were labeled by an ATTO 647N-conjugated DNA aptamer because attaching a single fluorophore to each aptamer molecule can be performed with high precision [25]. With the aptamers designed to bind to their target in a one-to-one stoichiometry, quantification becomes straightforward. Images of the distribution of TfR clusters are shown in Fig. 7.4. Using information from the two active detector events, one can estimate that the total number of internalized TfRs in each cell is on the order of \({\sim }100 000\). In addition, this counting method can determine molecular density variations of internalized TfRs in the cell, which is not possible by conventional measurements such as quantitative Western blots. With the spatial resolution delivered by the STED recording, most intracellular TfRs can be visualized as separate clusters. Interestingly, the number of estimated TfRs in each isolated cluster closely follows an exponential distribution with an expectation of \({\sim }6.0 \pm 1.9\). It also indicates that a single cluster may have a capacity to accommodate more than 20 TfR molecules since the measurement fits an exponential distribution up to 20 fluorophores closely.

Fig. 7.4
figure 4

Figure reproduced from [22]

Counting the number of transferrin receptors (TfR) in HEK293 cells. Living cells incubated with ATTO 647N-conjugated anti-TfR aptamer. After incubation, excess aptamer molecules were washed off and cells were chemically fixed. Stained receptors were imaged in confocal and STED mode, with 100 nm increments along z. a Summed axial projection of confocal and STED images (raw data) along 0.9 \(\upmu \)m depth. b Estimated 3D molecular map resulting from photon statistics of both confocal and STED recordings. Colors present the axial position. c Isosurfaces of the molecular map in the boxed region in a and b. The isosurfaces include 70% of all molecules in this region. The number of molecules in each segment is displayed below. d The histogram of the estimated number of TfR receptors per recognized separated segment. The red line is an exponential fit of the number of occurrences up to 24 molecules per spot (inset shows the residual of the fit). Scale bars: 1 \(\upmu \)m.

3 Mean and Variance in RESOLFT Nanoscopy

RESOLFT experiments require switchable fluorophores with on- and off-states [2]. For simplicity, only the positive imaging mode is discussed here, i.e., where the MCF features a sharp, subdiffraction-sized positive signal peak. This is the commonly used imaging mode. Another common assumption is that the molecules independently switch between their states and independently emit fluorescence.

Two main steps can be differentiated. At first, fluorophores are transferred from the off-state, in which they reside initially, into the on-state only within a sharp, subdiffraction-sized region centered at the current scanning position \(\mathbf {s}\). In practice, this is achieved by first activating fluorophores with a diffraction-limited focus, and then deactivating fluorophores in the periphery using a doughnut-shaped focus with a central intensity minium [6]. The spatial distribution of the probability of a single fluorophore to be effectively activated after the first step is referred to as the activation probability \(p(\mathbf {x})\). As a second main step of RESOLFT, activated fluorophores are then read out by recording their fluorescence signal. The spatial distribution of the average recorded fluorescence photon signal of a single activated fluorophore during the readout time is \(\lambda (\mathbf {x})\), which is typically proportional to the confocal PSF but can also be performed in STED mode [26].

The MCF of RESOLFT nanoscopy is therefore given by the activation probability multiplied by the average readout signal,

$$\begin{aligned} \text {MCF}(\mathbf {x}) = p(\mathbf {x}) \lambda (\mathbf {x}). \end{aligned}$$
(7.8)

In the following, the influence of the switching step on the obtained photon statistics is studied. To demonstrate the principal statistical properties of the two-step imaging process in RESOLFT, a simplified model is analyzed.

3.1 Cumulants of the Fluorescence of Switchable Fluorophores

Let us begin by assuming n molecules, all with the same activation probability p and the same brightness \(\lambda \), which is the average signal that is obtained from a single activated fluorophore. Further, let us assume that there is no spatial dependency in the measurement process (e.g. a situation of an isolated cluster of molecules where the scanning position and the cluster position coincide). There are three unknown properties (number of molecules, activation probability and brightness) but only a single variable (fluorescence signal) that is measured. How can the unknown properties be retrieved with only a single measurement variable? The basic idea is to exploit higher-order characteristics (cumulants) of the measured signal that go beyond the mean in order to separate the unknown properties. Figure 7.5 illustrates this idea, showing that different choices of parameters resulting in the same mean signal can still differ in their variance (or other higher-order characteristics). For a purely Poisson-distributed signal, all cumulants revert to the same value, the mean of the signal. It turns out that the switching step in RESOLFT nanoscopy is the essential factor in separating the number and the brightness.

Fig. 7.5
figure 5

Figure adapted from [27]

Left: Depiction of a cluster of \(n=20\) molecules with different activation probabilities p and different brightnesses \(\lambda \). Right: Photon counting histogram. The mean remains unchanged for all cases, while the variance differs. Calculating statistical parameters of the measured photon numbers allows estimating the number of molecules.

The goal is to calculate the cumulants of the fluorescence signal of a single molecule, given p and \(\lambda \). The signal can be modeled as the result of a two-step stochastic process, where the first step is the activation and the second step is the photon emission by the activated fluorophore. The activation is modeled as a Bernoulli process with activation probability p. The activation state of the fluorophore is represented by a Bernoulli random variable A. If the fluorophore is in the on-state it will yield \(\lambda \) detected photons on average, which is modeled by a Poisson distribution. Therefore, the random variable describing the signal of a fluorophore B given the activation state A is

$$\begin{aligned} B|A=1\sim & {} \text{ Poisson }(\lambda ) \nonumber \\ B|A=0\sim & {} 0 \end{aligned}$$
(7.9)

The characteristic function \(\phi (t)\) of a Bernoulli-distributed random variable is \(1-p+p\exp {(it)}\) and of a Poisson-distributed random variable \(\exp {(\lambda (e^{it}-1))}\). The characteristic function of B can be computed with a conditional expectation over A.

$$\begin{aligned} \phi (t) = \mathbb {E}_A \mathbb {E}_{B|A} \exp {\left( itB\right) } = 1-p+p\exp {\left( \lambda (e^{it}-1)\right) } \end{aligned}$$
(7.10)

where \(\mathbb {E}\) represents the expectation. The cumulant-generating function H(t) is the logarithm of \(\phi \) with a series expansion giving the cumulants

$$\begin{aligned} H(t)=\log \phi (t) = \sum _{n=10}^{\infty }\kappa _n\frac{(it)^n}{n!}. \end{aligned}$$
(7.11)

The cumulants are polynomials with increasing order in p and \(\lambda \).

$$\begin{aligned} \kappa _1= & {} p\lambda \nonumber \\ \kappa _2= & {} p\lambda +p(1-p)\lambda ^2 \\ \kappa _3= & {} p\lambda -3p(1-p)\lambda ^2+p(1-p)(1-2p)\lambda ^3 \nonumber \end{aligned}$$
(7.12)

The molecules are assumed to act independently from each other, so that the cumulant of the summed signal of all n molecules can conveniently be expressed as the sum of single-molecule cumulants. This means that the expressions in (7.12) are simply scaled by n.

If the first three cumulants would also be estimated empirically as \(\hat{\kappa }_{1,2,3}\) one could solve the non-linear equation system in (7.12) to obtain estimates \(\hat{n}, \hat{p}, \hat{\lambda }\) for the activation probability, the brightness and most importantly, for the number of molecules.

$$\begin{aligned} \hat{n}= & {} \hat{\kappa _1}^2\left( \hat{\kappa _1}-\hat{\kappa _2}\right) /\left( \hat{\kappa _1}^2-\hat{\kappa _1}\hat{\kappa _2}-\hat{\kappa _2}^2+\hat{\kappa _1}\hat{\kappa _3}\right) \nonumber \\ \hat{p}= & {} \left( \hat{\kappa _1}^2-\hat{\kappa _1}\hat{\kappa _2}-\hat{\kappa _2}^2+\hat{\kappa _1}\hat{\kappa _3}\right) /\left( \hat{\kappa _1}\hat{\kappa _2}-2\hat{\kappa _2}^2+\hat{\kappa _1}\hat{\kappa _3}\right) \\ \hat{\lambda }= & {} \left( \hat{\kappa _1}\hat{\kappa _2}-2\hat{\kappa _2}^2+\hat{\kappa _1}\hat{\kappa _3}\right) / \left( \hat{\kappa _1}\left( \hat{\kappa _1}-\hat{\kappa _2}\right) \right) \nonumber \end{aligned}$$
(7.13)

This principle is exploited in the next section for a realistic RESOLFT imaging model.

3.2 Statistical Model

In the general case, the activation probability and signal in the active state is position-dependent. The following transformations hold

$$\begin{aligned} p\rightarrow & {} p(\mathbf {x}_i - \mathbf {s}) \nonumber \\ \lambda\rightarrow & {} \lambda (\mathbf {x}_i - \mathbf {s}) \nonumber \end{aligned}$$

with \(\mathbf {x}_i\) the position of the molecules and \(\mathbf {s}\) the scanning position. Again, the signal is a result of a two-stage stochastic process. The activation strength and the average readout signal, however, both depend strongly on the position of the molecules relative to the focal center. Rewriting the mean \(m(\mathbf {s})\) and variance \(v(\mathbf {s})\) of the total signal using the results of the previous section (see (7.12)) with position-dependent parameters and adding a Poissonian background \(d(\mathbf {s})\) gives

$$\begin{aligned} m(\mathbf {s})= & {} \sum _{i=1}^N p(\mathbf {x}_i - \mathbf {s}) \lambda (\mathbf {x}_i - \mathbf {s}) + d(\mathbf {s}) \nonumber \\ v(\mathbf {s})= & {} m(\mathbf {s}) + \sum _{i=1}^N p(\mathbf {x}_i - \mathbf {s}) \left( 1-p(\mathbf {x}_i - \mathbf {s})\right) \lambda ^2(\mathbf {x}_i - \mathbf {s}) \end{aligned}$$
(7.14)

The transition to a continuous grid can be carried out analogously to the procedure in Sect. 7.2.1. With the transformation

$$\begin{aligned} \sum _{i=1}^N p(\mathbf {x}_i - \mathbf {s}) \lambda (\mathbf {x}_i - \mathbf {s}) \rightarrow \left( n *p \lambda \right) (\mathbf {s})\nonumber \end{aligned}$$

the mean and variance of the total signal can conveniently be expressed as convolutions with the density of fluorophores \(n(\mathbf {x})\). Additionally, the readout signal \(\lambda \) is written as a product of the focal brightness b (i.e. \(\lambda (0)\)) and the readout PSF \(h_{\text {read}}\). The focal brightness is assumed constant for all the molecules.

$$\begin{aligned} m(\mathbf {s})= & {} b \left( n * h_{\text {read}}p\right) (\mathbf {s}) + d(\mathbf {s}), \nonumber \\ v(\mathbf {s})= & {} m(\mathbf {s}) + b^2 \left( n * h_{\text {read}}^2 p (1-p)\right) (\mathbf {s}). \end{aligned}$$
(7.15)

The mean is the convolution of the number density with the MCF and background contributions while the variance exceeds the mean by an additional term with a convolution kernel \(b^2h_{\text {read}}^2p\left( 1-p\right) \) which is similar but not identical to the MCF. Note that for \(p(\mathbf {x})=1\) the situation of non-switchable fluorophores is recovered, the excess variance term vanishes and a purely Poisson distributed signal is regained. In the case of photo-switchable fluorophores, the variance of the collected signal is augmented due to the stochastic nature of the activation in the preparation step. The excess variance or ‘over-dispersion’ term is proportional to the square of the focal brightness b, unlike the mean signal, which scales linearly with it. This relation can be used to estimate b directly from integrated mean and variance of the image data using a Method of Moments estimator (MME).

For a region X in the sample that comprises a conglomeration of molecules but is isolated from other such regions the position-dependent mean and variance of the signal can be summed (\(M=\int _X m(\mathbf {s})\, d\mathbf {s}\), \(V=\int _X v(\mathbf {s})\, d\mathbf {s}\)) and the convolutions in (7.15) reduce to simple products.

$$\begin{aligned} M&= b N_X H_1 + D, \nonumber \\ V&= M + b^2 N_X H_2, \end{aligned}$$
(7.16)

with \(H_1= \int _{\mathbb {R}^2} (h_{\text {read}}p)\, d\mathbf {s}\) and \(H_2= \int _{\mathbb {R}^2} \left( h_{\text {read}}^2 p (1 - p)\right) \, d\mathbf {s}\) being integrals of products of p and \(h_{\text {read}}\) and \(D=\int _Xd\, d\mathbf {s}\) the integrated background. If the integrated mean and variance can be estimated empirically as \(\hat{M}\) and \(\hat{V}\) then (7.16) can be solved for the focal brightness resulting in the MME \(\hat{b}\):

$$\begin{aligned} \hat{b} = \frac{H_1}{H_2} \frac{\hat{V} - \hat{M}}{\hat{M} - D}. \end{aligned}$$
(7.17)

The results of a simulation of the RESOLFT imaging process and the estimation of the focal brightness are shown in Fig. 7.6. Essentially, the relative errors of \(\hat{b}\) and \(\hat{n}\) depend mainly on the number of measurements, and are largely independent of the number of molecules in the image [27].

Fig. 7.6
figure 6

Figure adapted from [27]

Simulation of the error of the estimation of number and brightness for a single cluster of molecules. a Relative standard deviation of the estimated number of molecules (for 20 molecules in a single cluster) with an activation probability of 20% and a variable number of repetitions. b Relative standard deviation of the estimated brightness under the same conditions. Above a threshold on the molecular brightness, the counting error mainly depends on the number of repetitions.

3.3 Counting rsEGFP2 Fused \(\alpha \)-tubulin Units in Drosophila Melanogaster

We illustrate the approach for RESOLFT nanoscopy imaging of Drosophila melanogaster larvae body wall muscles ubiquitously expressing rsEGFP2, a reversibly photoswitchable fluorescent protein (RSFP) [6], fused to \(\alpha \)-tubulin. A commercial RESOLFT nanoscope was used for the recordings, and the preparation of the sample is described in [28]. Tubulin is known to form helices with a diameter of approximately 25 nm, each turn comprising 13 dimers of \(\alpha \)- and \(\beta \)-tubulin that are spaced 8 nm apart [29]. The total density of \(\alpha \)-tubulin along a single filament can thus be estimated to be \(\approx \)1625 per \(\upmu \)m. To analyze the ratio of labeled to non-labeled \(\alpha \)-tubulin subunits in body wall muscles of Drosophila melanogaster, L3-larvae were dissected to isolate body wall muscles, which were subsequently used as a sample for Western blot analysis. The ratio of rsEGFP2-\(\alpha \)-tubulin to \(\alpha \)-tubulin in the muscle tissue was estimated to be \({\sim }\)1:6. Therefore, the number density of rsEGFP2 molecules along a microtubule fiber is expected to be \({\sim }230\) per \(\upmu \)m. Can a RESOLFT imaging experiment confirm this independently measured molecule density using the framework laid out in Sect. 7.3.2?

3.3.1 Switching Kinetics of rsEGFP2

Measurements of the switching kinetics of rsEGFP2 were conducted on a confocal point scanning microscope with additional widefield illumination paths for excitation at 491 nm and activation at 375 nm (both continuous wave). The signal was detected with a point detector at \(525\pm 25\) nm. The advantage of widefield illumination and point-like detection is that averaging effects of the observed kinetics due to inhomogeneous intensity distributions within the detected volume can be excluded. rsEGFP2 is activated by UV light and switches off while being read out, which means that during the effective activation step in RESOLFT while molecules are switched off fluorescence can also be recorded.

In this measurement, molecules were activated by applying the UV light for a comparatively long time (several ms), thus strongly saturating the activation. Assuming that this prepares all the molecules in the on-state, the fluorescence signal at any given time during the off-switching indicates the remaining on-state population of rsEGFP2 at that time. An example of an off-switching curve is shown in Fig. 7.7. Systematic deviations from a single exponential decay are evident, especially for intermediate times, motivating the use of a Gamma distributed decay model with parameters \(\alpha \) and \(\beta \) for fitting the off-switching curves.

$$\begin{aligned} s(t) = A\frac{\beta ^\alpha }{(\beta +t)^\alpha }+b \end{aligned}$$
(7.18)

with the total switching rate k given by the inverse of the time \(\beta (\exp {(1/\alpha )}-1)\) where the signal drops to 1/e of the amplitude. The equilibrium signal after complete switching, normalized by the initial signal, estimates the equilibrium population \(\rho _\infty \). It is \({\sim }2.5\%\), independent of the excitation strength. The switching rate seems to depend quite linearly on the excitation light intensity for intensities up to a few kW/cm\(^2\). These are values typical for light intensities close to the center of the focal intensity minima, which determine the resolution enhancement in a RESOLFT microscope.

Fig. 7.7
figure 7

Figure reproduced from [27]

Switching kinetics of rsEGFP2 in living E. coli cells overexpressing rsEGFP2 on a coverslip in an agarose gel. a Single switching curve with an off-switching light intensity of 500 W/cm\(^2\). Fits with a single exponential decay (red) and a gamma distributed exponential decay (yellow) with residuals shown below. b Estimated switching rate k in dependence on the off-switching light intensity. c Equilibrium on-state level in dependence of the off-switching light intensity.

3.3.2 Shape of the MCF

The MCF for rsEGFP2 molecules is modeled as a simple two-state switching process (‘on’/‘off’), with switching rates depending linearly on the applied light intensity and the frequently-made approximation that the doughnut of off-switching light features a parabolic intensity distribution close to the focal center [30]. The shape of the MCF is then a superposition of two Gaussian peaks that differ only in amplitude and width:

$$\begin{aligned} \text {MCF}(r) = b\left( \rho _\infty e^{-4\ln (2) r^2 w_\text {read}^{-2}} + \left( p(0) - \rho _\infty \right) e^{-4\ln (2) r^2 w_\text {RESOLFT}^{-2}}\right) \end{aligned}$$
(7.19)

with the equilibrium activation level \(\rho _\infty \) and the focal activation level p(0). The broader peak with diffraction-limited resolution \(w_{\text {read}}\) and low amplitude represents the signal from the nonzero equilibrium activation (see Fig. 7.7c) while the sharp peak with increased resolution \(w_{\text {RESOLFT}}\) represents the usable RESOLFT signal that originates from the subdiffraction-sized spot with an effective activation level above the equilibrium value. While the absolute scaling factor, the focal brightness b, is generally very difficult to predict, the relative amplitudes and sizes of the two Gaussian spots forming the MCF can typically be modeled reasonably well or retrieved from the data itself. Here, the structure consists of microtubule fibers with a width well below the resolution. For a sufficiently sparse distribution of filaments and high enough signal-to-noise ratio, the object can approximately be estimated from the data as a set of curved lines, even without quantitative knowledge of the MCF, simply by detecting lines in the image [31]. With the given image data and estimated object the shape of the MCF can be retrieved by a deconvolution and a fit of the shape model to the deconvolution result. Image data and retrieved object are shown in Fig. 7.8. The estimated FWHM of the diffraction-limited peak was 235 nm, and of the sharp RESOLFT peak 73 nm, and with the knowledge of the equilibrium activation level of \({\sim }2.5\%\) the focal activation level can be estimated to be \({\sim }17\%\).

Fig. 7.8
figure 8

Figure reproduced from [27]

Counting \(\alpha \)-tubulin units in Drosophila melanogaster. a RESOLFT image of dissected body wall muscle cells expressing rsEGFP2-\(\alpha \)-tubulin with regions (A–E) in which the number of rsEGFP2 molecules was counted. b Line detection of the tubulin structure. c, d PSF shape detection based on the image (a), the object detection (b) and the equilibrium on-state population (see Fig. 7.7). Scale bars, 1 \(\upmu \)m (a, b), 200 nm (c).

3.3.3 Counting the Number of Molecules Along Filaments

Applying the method of moments estimator for the focal brightness given in (7.17) on the whole image shown in Fig. 7.8a yields a value of \(\hat{b}\) of \({\sim }0.9\) photon counts per activated rsEGFP2 molecule. Please note that the readout time in this experiment was set to a very short duration because rsEGFP2 switches off during readout, however, there are RSFPs that decouple the off-switching and the readout [32, 33]. The total average contribution of each fluorophore to the RESOLFT image can be estimated by integrating the MCF given by (7.19) using the estimated shape and estimated focal brightness. The average value of 3.58 photons per rsEGFP2 molecule can then be used to quantify the number of fluorophores in a region in the image. The results of applying the estimator to the regions marked in Fig. 7.8a are shown in Table 7.1. The average number of rsEGFP2 molecules per \(\upmu \)m along a microtubule filament is \({\sim }180\), which is similar but below the expected number density. This might hint at a less efficient incorporation of rsEGFP2 fused to \(\alpha \)-tubulin in microtubules. A detailed description of this experiment as well as a theory for the counting error is given in [27].

Table 7.1 Analysis of the regions A–E in the number density shown in Fig. 7.8a

4 Summary

Ensemble-based, coordinate-targeted fluorescence microscopy methods like STED or RESOLFT deliver spatial resolution on the nanometer scale [34]. Registering all the molecules at a certain position at the same time in principle provides an advantage of recording speed. The repeatability and live-cell compatibility make quantitative applications of STED and RESOLFT highly desirable. Reaching this goal, however, has so far been challenging, in large measure because the observed Poisson statistics of the fluorescence signal does not lend itself to separating the number of the molecules from their brightness.

Fortunately, effects like photon ‘antibunching’ or an additional on-switching step of fluorophores manifest themselves as higher-order terms in the observed photon statistics. A careful statistical analysis and modeling of the imaging process reveals that these higher-order components differ mostly by containing higher powers of the molecular brightness. Estimating these components empirically allows to solve the inverse problem and retrieve the molecular density and brightness from the observed non-Poissonian photon statistics.

The statistical models for both investigated effects for STED and RESOLFT nanoscopy are remarkably similar. However, as higher-order effects, their contribution to the signal is usually only weak and a sufficiently high signal-to-noise ratio is required for the described methods to yield precise results. In particular, the molecules should not photobleach during the experiment.

A central assumption is the independence and uniformity of behavior of the individual molecules, at least locally within a region of interest. A violation of this assumption or a compromised (false) estimation of the shape of the MCF will result in a biased counting procedure.

This chapter demonstrated that modeling and analysis of the obtained photon statistics are key elements to achieving quantitative subdiffraction-resolution information in STED and RESOLFT nanoscopy.