Photonic Imaging with Statistical Guarantees: From Multiscale Testing to Multiscale Estimation

In this chapter we discuss how to obtain statistical guarantees in photonic imaging. We start with an introduction to hypothesis testing in the context of imaging, more precisely we describe how to test if there is signal in a speciﬁc region of interest (RoI) or just noise. Afterwards we extend this approach to a family of RoIs and examine the occurring problems such as inﬂation of type I error and dependency issues. We discuss how to control the family-wise error rate by different modiﬁ-cations, and provide a connection to extreme value theory. Afterwards we present possible extension to inverse problems. Moving from testing to estimation, we ﬁnally introduce a method which constructs an estimator of the desired quantity of interest with automatic smoothness guarantees.


Introduction
The analysis of a photonic image typically involves a reconstruction of the measured object of interest which becomes the subject of further evaluation. This approach is frequently employed in photonic image analysis, though it can be quite problematic for several reasons.
1. As the image is noisy and often inherently random, a full reconstruction relies on the choice of a regularisation functional and corresponding a priori assumptions on the image, often implicitly hidden in a reconstruction algorithm. Related to this, the reconstruction relies on the choice of one or several tuning parameters. A proper choice is a sensible task, in particular when the noise-level is high and/or inhomogeneous. 2. The sizes of the objects might be below the resolution of the optical device which further hinders a full reconstruction. 3. As the resolution increases, the object to be recovered becomes random in itself as its fine structure then depends on, e.g., the conformational states of a protein and the interpretation of the recovered object might be an issue.
It is the aim of this chapter to provide a careful discussion of such issues and to address the analysis of photonic images with statistical guarantees. This will be done in two steps. In Sect. 11.2 we survey some recent methodology, which circumvents a full recovery of the image, to extract certain relevant information in such difficult situations mentioned above. Based on this (see Sect. 11.3), we will extend such methods also to situations in which a full reconstruction is reasonable, but still a difficult task, e.g., when the multiscale nature of the object has to be recovered. In both scenarios we will put a particular emphasis on statistical guarantees for the provided methods. An example where a full recovery of the object of interest is typically not a valid task is depicted in the centre of Fig. 11.1 where a detail of a much larger image is shown (see Fig. 1 in [1] for the full image). The investigated specimen consists of DNA origami which have been designed in such a way that each of the signal clusters contains up to 24 fluorescent markers, arrayed in two strands of up to 12, having a distance of 71 nanometers (nm) (see left panel of Fig. 11.1 for a sketch of such a DNA origami). As the ground truth is basically known, this serves as a real world phantom.
Data were recorded with a STED (STimulated Emission Depletion) microscope at the lab of Stefan Hell of the Department of NanoBiophotonics of the Max Planck Institute for Biophysical Chemistry. In contrast to classical fluorescence microscopy, the resolution in STED microscopy is in theory not limited and can be enhanced by increasing the intensity of the depletion laser [2]. However, this increase comes at the price of a decrease in intensity of the focal spot, which bounds the resolution in practice. Therefore a convolution of the underlying signal with the PSF of the STED microscope is unavoidable and a full reconstruction of the DNA origami (or the shape of the markers) appears to be difficult. However, for most purposes this is also  Fig. 1 in [1]) Left: Sketch of single DNA origami, middle: detail of image of randomly distributed DNA origami, right: detected strands of markers not relevant. Instead, less ambitious tasks will provide still important information, e.g., the location of these fluorescent markers. This can be done via a statistical test, which is presumably a much simpler task than reconstruction (estimation in statistical terminology) and it can be tailored towards answering particular questions "How many strands of markers are there?" and "Where are the DNA origamis located?". The right panel of Fig. 11.1 shows the locations of markers as found by such a statistical test (from the data in the middle panel in Fig. 11.1) which will be introduced later on.

Introduction
We will see that proper testing in the above example ( Fig. 11.1) is already a complex task. Therefore, in this section, we first introduce the concept of statistical testing in a basic setting. The first step in statistical hypothesis testing is to define the so-called null hypothesis, H, and the alternative hypothesis, K : H : "Hypothesis to be disproven" K : "Hypothesis to be sustantiated".
For example, H might correspond to the hypothesis that no marker is contained in a certain given region of the image, K corresponds to the contrary that there is at least one marker in this region. A statistical significance test is a decision rule which, based on given data, allows to discriminate between H and K . If a certain criterion is met, H is rejected and K is assumed. If not, H cannot be rejected. For instance, the photon count in a certain given region of a noisy image gives rise to the believe that at least one marker is contained therein. This could be tested, for example by checking whether the total number of photons detected in this region is larger than a certain threshold. However, due to the involved randomness of photon emissions and background noise such a finding is associated with a (certain) risk of being incorrect. A statistical test aims to control this risk. Hence, prior to performing a statistical test, a tolerable risk α is specified, typically in the range of 0.01 up to 0.1, corresponding to accepting the error rate that, on average, in at most α · 100% of the cases the null-hypothesis H is falsely rejected. Such an α is called significance level. This is written as P ("H is rejected although H is true") ≤ α. (11.1) Here, P stands generically for all possible distributions under H and P(A) denotes the probability 1 of an event A. If the test criterion is chosen such that (11.1) holds, the corresponding test is called a level-α-test. The ability of a test to correctly reject H is called detection power. If H corresponds to the hypothesis that no marker is located in a certain given region, the test (i.e., the data based decision procedure) is then constructed in such a way that the probability α to falsely detect a marker in an empty region is controlled. H and K are chosen in such a way that the false rejection of H is to be considered the more serious error and controlled in advance. In our scenario, this means that we consider wrong detection of a fluorophore as the more serious error than missing a fluorophore.

A Simple Example
To demonstrate this concept more rigorously, we now consider a very simple Gaussian model, which can be seen as a proxy for more complicated models. Assume that one observes data (11.2) where μ i ≥ 0 denote possible "signals" hidden in observations Y i , and ε i ∼ N (0, 1) are independent normal random variables with variances σ 2 = 1 (for simplicity). Assume for the moment that all signals have the same strength, μ i ≡ μ ≥ 0. The interest lies in establishing that μ > 0, i.e., presence of such signal in the data. Hence, we set H : μ = 0 (to be disproven) vs. K : μ > 0 (to be substantiated). (11.3) The goal is now to find a suitable criterion which, given Y 1 , . . . , Y n , allows to decide in favour or against H in such a way that the error to wrongly reject H is controlled by α. From a statistical perspective the aim is to infer about the mean of the Y i which should be close to the empirical meanȲ = 1 n n i=1 Y i of the data. An intuitive decision rule would be to check whetherȲ is "clearly" larger than zero, Y > γ α , say, for a suitable threshold γ α > 0. We consider the normalized (i.e. with unit variance) sum and choose, for prescribed α > 0, the threshold γ α such that we have equality in (11.1). As under the assumption H we have that μ = 0, this gives 2 Here Φ denotes the cumulative distribution function of a standard normal random variable: 2π (see, e.g., [3], inequality (1.8)), we obtain for sufficiently large n. This means that, if the number n of data points grows, the detection power (the case when μ > 0) of the Z-test converges to 1 exponentially fast. This test has been derived in an intuitive way but it can be proven that it is a uniformly most powerful (UMP) test (see [4], Chap. 3.4). This means that for all μ > 0 (i.e. the alternative K holds) the detection power is maximized among all

Z-test
Comparison of the normalized empirical mean of the set of measurements to a given threshold to assess difference in location to a given constant μ 0 . When μ 0 = 0 the Z-test rejects H : This is the best possible test at level α if the data Y 1 , . . . , Y n are independent and N (μ, 1) distributed.

Testing on an Image
Subsequently, we consider three illustrative synthetic images of size 60 × 60, shown in Fig. 11.2 (see the upper panel for a noise-less version and the lower panel for a noisy image). These serve the purpose of explaining how to extend the above simple Z-test to detect a signal in an image, which is a more complex task. To illustrate, we assume for the moment that in these images the intensity on each pixel Y i , i = 1, . . . , n, follows a N (μ i , 1) distribution, where each μ i takes one of the four values 0, 2, 3.5 and 5 (see Fig. 11.2). Now, our goal is to segment the image into regions with signal and empty regions while maintaining statistical error guarantees. Note that we do not aim to recover the exact value of each μ i , only whether it is positive or not (no signal). To this end we will perform many "local" statistical Z-tests on different (and possibly overlapping) regions of this image. We will discuss several approaches (Scenarios 1-5) which provide a step-by-step derivation of our final solution (Scenario 5). As it turns out, the crucial issue will be to control the statistical error of wrong decisions of all these tests simultaneously (overall error).  (11.6) and are compared to z 1−α = 1.6449. Again, each test allows for two outcomes: rejection or no rejection. In the second row of Fig. 11.4, exemplary results are depicted (all pixels, for which positive test decisions have been made are marked green).
It is obvious that Scenario 2 gives more detailed information on the signal, but at the expense of several false detections. This is an important issue and will be discussed in more detail in the following section. It is also obvious that parts of the weak signal are missed (see Fig. 11.4: Only 71.25% of the active pixels are detected in the left test image and 85% in the second one). This is due to the fact that the local tests do not take into account neighboring information (surrounding data) from which they could borrow detection strength. This will also be refined in the subsequent sections.

False test decisions
There are two kinds of possible false test decisions: 1. Type I error (probability of its occurrence is controlled by α).
Here: Selection of a RoI although it does not contain any signal (see lower right panel of Fig. 11.4). 2. Type II error (a missed rejection, not controlled).
Here: Missing to select a RoI that contains signal (see lower left panel of Fig. 11.4).

Testing Multiple Hypotheses
In Scenario 2 in the previous section we applied 400 single Z-tests in the central square of the synthetic image. It is obvious from Fig. 11.4 that this approach suffers from many false detections, in particular when the signal gets sparser (see lower right plot in Fig. 11.4). This issue becomes even more severe if the number of tests increases, as the following test scenario illustrates. In fact, this did not just randomly happen, it is a systematic flaw which we encounter when we naively perform many tests on the same image, simultaneously.

Number of False Rejections
The statistical control of false rejections is a general problem one encounters in multiple testing (i.e., testing many hypotheses simultaneously on the same data). The increase of false rejections with increasing number of tests is denoted as multiplicity effect. Figure 11.6 shows the probabilities that out of n independent Z-tests, at least 1 (solid line), 10 (dashed line), 75 (dotted line) and 150 (dash-dotted line) false rejections occur. The curves suggest that in the situation of Scenario 3 we need to expect at least 150 false detections. In fact, the probability that many wrong rejections are made within N tests, each at level α, performed on a data set converges to 1 exponentially fast.
Proof The random variables I {i − th test rejects}, where I denotes the indicator function, follow a Bernoulli distribution with parameter α. Therefore, if α ≤ 1/2, we can estimate the probability that out of N ≥ 2 tests k false rejections are made, as It follows, e.g., by induction over k for any Hence, the probability of making at least k out of N false rejections converges to 1 exponentially fast, as N → ∞.
To reduce the number of false detections, so-called multiplicity adjustments have to be made. Two general approaches in this regard are the control of the family wise error rate (FWER) and of the false discovery rate (FDR). Here, we will mainly focus on the FWER but will briefly discuss FDR control in Sect. 11.2.9. For further reading we refer to the monograph by [5] and the references given there.

Multiplicity effect
If multiple tests are performed without accounting for multiplicity, the chances of making many type I errors are quite large if the false null hypotheses are sparse (see Fig. 11.5).

Control of FWER
One possible way to deal with multiplicity is to control the family wise error rate (FWER), that is, controlling the probability of making any wrong decision in all tests that are performed. Assume model (11.2) and denote by μ μ μ = (μ 1 , . . . , μ n ) the vector of all true means and by P μ μ μ the probability under configuration μ μ μ. In the previous example of imaging, the sample size n corresponded to the number of pixels. Scenarios 2 and 3 were based on many single tests (on many single pixels). Such single tests will be referred to as local tests in the sequel. Each of the N (say) local tests corresponds to its own (local) hypotheses H i versus K i . For example, in the setup of Scenario 3, a local hypothesis is H i : μ i = 0 versus the local alternative K i : μ i > 0, for some i = 1, . . . , n. In this case n = N when all local hypotheses are tested. If only a few are tested, then N n. If in addition all 2 × 2 RoIs are tested a total of N ≈ 2n tests are performed.
Assume now that all local tests H i vs. K i are performed, each at error level α/N . Then the risk of making any wrong rejection is controlled at level α, that is, the FWER is controlled.
Since the right hand side is independent of μ μ μ we say that the FWER is controlled in the strong sense. As a consequence, each finding can be considered α-significant and hence can be used as a segment for the final segmentation. Performing tests at an adjusted level such as α/N instead of α is called level adjusted testing and the multiple test "reject those H i which are significant at the adjusted level α/N " is called Bonferroni procedure. We stress that although Theorem 11.1 was formulated for the special case of signal detection in independent Gaussian noise, the Bonferroni procedure strongly controls the FWER in much more generality and in particular without any assumptions on the dependency structure between different tests [5, see, e.g., Chap. 3.1.1, for a more detailed discussion].
Scenario 4 (Unknown position, pixel-wise, Bonferroni adjustment) In the situation of Scenario 3, we now perform a Bonferroni procedure for the entire image, i.e., for all 60 × 60 = 3600 entries (see Fig. 11.7). The local testing problems are

Bonferroni multiplicity adjustment
Adjustment (increase) of the thresholds when multiple tests are performed simultaneously to control the overall type I error, i.e., the FWER. This is a very general but also a conservative method (in particular if the signal is not sparse).

Lemma 11.2 There exists N
Proof To bound the normal quantiles from above and below, we use x , [6, see inequality (10)] , where ϕ and Φ denote the density and cdf of the standard normal distribution, respectively. Since, for sufficiently large N , and therefore the right hand side follows. We further have that for sufficiently large N , and the left hand side follows.

Towards Better Detection Properties
The Bonferroni approach is valid in most generality. Nevertheless, as we have seen in Fig. 11.7, if applied pixel-wise the level adjustment (and the resulting increase of the threshold) is (much) too strict for our purposes. This is not caused by the Bonferroniadjustment per se, as it can be shown that the detection power of the Bonferroni approach cannot be considerably improved in general [7, Sect. 1.4.1]. The issue is that we have only considered each single pixel as input for our local tests. Therefore, we will extend this from single pixels to larger systems of RoIs, which allow to "borrow strength from neighbouring pixels". This makes sense as soon as the signal has some structure, e.g., whenever signal appears in (small) clusters or filament-like structures. To see this, suppose that for k > 1 we have An uncorrected pixel-wise Z-test would compare each Y i to the threshold z 1−α , i.e., signal in a pixel would be detected if This is almost impossible if μ is too small or the noise takes a negative value and becomes even worse if a multiplicity adjustment is performed. If we instead group the first k pixels together and perform a grouped Z-test, i.e., compare 1 This way, the signal is "magnified" by a factor √ k. Unfortunately, performing, for any k, every test that groups k pixels together and thereby incorporating the fact that positions i and numbers k of relevant pixels are in general not known in advance, is infeasible. 3 However, if the data is clustered spatially we can construct a reasonable test procedure that follows a similar path. Instead of performing all tests that group any configurations of k pixels, we perform all tests that merge all pixels in a k × k square, for many different values of k and "scan" the image for signal in such regions in a computationally and statistically feasible way. Now the local tests become (locally highly) correlated (see Sect. 11.2.6) and a simple Bonferroni adjustment does not provide the best detection power any more, although (11.7) is still valid. This will be the topic of Sects. 11.2.6 and 11.2.7.

Amplification of the signal strength by aggregation
If a signal is spatially grouped in clusters, cluster-wise tests can increase its detectability. The average of all signal strengths inside the test region is magnified by a factor of √ size of cluster.

Scanning
In a way, the two approaches of aggregating data over the entire image (Scenario 1) and performing pixel-wise tests (as done in Scenarios 2-4) are the most extreme scenarios. As a rule of thumb, aggregation makes detection easier at the cost of losing spatial precision whereas pixel-wise testing provides the highest possible spatial precision but makes detection more difficult (after Bonferroni level adjustment as we have seen in Scenario 4. Recall that since the tests are independent we know that there is no substantially better way to control the FWER). In a next step we will combine both ideas. We test on various squares of different sizes to achieve accuracy (small regions) where possible and gain detection power (larger regions) where the signal is not strong enough to be detected pixel-wise, i.e., on small spatial scales. As the system of all subsquares of an image consists of many overlapping squares, we have to deal with locally highly dependent test statistics. Table 11.1 illustrates this effect presenting simulated values of the family wise error rate, based on 1000 simulation runs each, with preassigned value α = 0.05. Squares of size h × h, h ∈ {1, 2, 3, 4, 5} in an image of 60 × 60 are considered. The parameter h is denoted as a spatial scale. The results of this small simulation study demonstrate that the Bonferroni correction is much too strict if we aggregate data in larger squares. The following scenario is tailored towards dealing with this specific type of dependency structure and is called multiscale scanning. Here, the level adjustment is made in an optimal spatially adaptive way, i.e., such that the thresholds are both, large enough so that the FWER is controlled but on the other hand so small that smaller thresholds can no longer guarantee the control of the FWER. The key is now to exploit that the system of all h × h squares fitting into the n × n image is highly redundant. For instance, if a square is shifted one pixel to the right, say, both squares share most of their pixels and their contents should not be treated as independent. We discussed in where w(h) is a size-dependent correction term, given by Under H , the quantiles can be simulated as described in Algorithm 1. Recall that in Lemma 11.2 it was shown that the quantile z 1− α N and therefore also the quantile of the maximum, z (1−α) N , are approximately of size 2 log(N ). When pixels are aggregated over h × h squares, the corresponding quantiles can be shown to be of first asymptotic order 2 log(N / h 2 ) (the leading term of w(h) in (11.10), see Theorem 11.2 for details), which corresponds to the case of N / h 2 independent tests. This is incorporated into the construction of the thresholds as described in Algorithm 1. Compute all test statistics T h×h square (Y ); 5 Compute all w(h)(T h×h square (Y ) − w(h)); 6 Save their maximal value in q h ;

Algorithm 1: Simulation of the thresholds
In line 12 of Algorithm 1, the size-dependent thresholds q h 1−α = t j /w(h) + w(h) are defined. Comparing each T h×h square (Y ) to q h 1−α yields a multiplicity adjusted multiple test procedure. Note that in Algorithm 1 the quantile of the maximum over all, locally correlated, test statistics under the global null hypothesis is approximated. This way, the dependence structure is taken into account precisely.

Scenario 5 (Unknown position, multiscale scanning) We now aggregate test results
for several different scanning tests. We consider testing each pixel, as well as testing each 2 × 2, 3 × 3, 4 × 4 and 5 × 5 square. In total these are 16.830 tests. We now All significant squares are stored and finally, after all square-wise comparisons have been made, for each pixel, the smallest square that was significant is plotted. Findings for the different sizes are color-coded and for each pixel the color corresponding to the smallest square in which signal was detected is plotted. The results are shown in Fig. 11.8. One big advantage of this approach is that also the weak signal is now completely included in the segmentation in contrast to even the unadjusted approach of Scenario 2 (compare the lower left plots of Figs. 11.4 and 11.8). Also, the color-coding visualizes regions of strong signal and therefore contains "structural information" on the data.
The procedure in Scenario 5 is such that the FWER is still controlled in a strong sense, although the thresholds can be chosen smaller than in a Bonferroni approach. This is much more so if N and h get larger, but is visible starting from h = 4, which matches the values given in Table 11.1. This was possible due to the strong local correlations between tests. Roughly speaking, for each size of the moving window a Bonferroni-type adjustment is made for the (maximum) number of non-overlapping squares of that size which is a considerable relaxation. Remarkably, the prize for including many different sizes is extremely small. More theoretical details can be found in Sect. 11.2.7.
To conclude this section, it should be stressed that in many situations, we do not encounter rectangular signals, however, small rectangles can be considered as building blocks for more complex structures. If specific shape information is available, Fig. 11.8 Noisy signals (upper row) and the corresponding test results from Scenario 5 (lower row). Significant 5 × 5 squares are plotted in yellow. Significant 1 × 1 -4 × 4 squares are plotted in green with increasing brightness. For each pixel, the smallest square which was found significant was plotted. Insignificant regions are coloured in blue this can be incorporated into the testing procedure as long as the regions are not too irregular and the set of regions satisfies a Vapnik-Cervonenkis-type complexity condition (see [8] for more details). The literature on multiscale scanning methods is vast. In the particular context of imaging, the reader may also consult [9][10][11][12] for related ideas.

Multiscale Scanning
With probability guarantee of 1 − α all of the RoIs chosen in the multiscale scanning procedure described in Scenario 5, are valid. Hence, we obtain localized RoIs where the signal is sufficiently strong and profit from aggregation, as described in Sect. 11.2.5.1, where the signal is weak and point-wise detection is too difficult.

Theory for the Multiscale Scanning Test
The following theorem is the theoretical foundation for Scenario 5.  (11.13) under the global hypothesis H :"no signal in any of the squares". Reject each hypothesis H h×h square (see (11.11)) for which (11.14) (i) This yields a multiple test for which the FWER at level α is controlled asymptotically (as |H|/n → 0, n → ∞) in the strong sense. (ii) This test is minimax optimal in detecting sparse rectangular regions of the signal.
Claims (i) and (ii) follow from Theorems 7 and 2 in [1]. Roughly speaking, the essence of the previous theorem is that we only need multiplicity control for approximately n 2 / h 2 (corresponding to the number of independent) tests instead of (n − h + 1) 2 (corresponding to the actual number of all) tests. Control of the FWER in the strong sense means that all significant squares can be used in the final segmentation (lower row of Fig. 11.8).
In this chapter we mainly focused on control of the FWER, however weaker means of error control are of interest as well. A very prominent one is the false discovery rate (FDR, [13]), which we briefly discuss in Sect. 11.2.9.

Deconvolution and Scanning
In photonic imaging additional difficulties arise. Firstly, we have to deal with non-Gaussian and non i.i.d. data (see Chap. 4), e.g., following a Poisson distribution with inhomogeneous intensities λ i . Then, as long as the intensity is not too small, a Gaussian approximation validates model (11.2) as a reasonable proxy for such situations. A formal justification for the corresponding multiscale tests is based on recent results by [14], for details see [1]. The price to pay for such an approximation is a lower bound on the sizes of testing regions that can be used, due to the fact that several data points (of logarithmic order in n) need to be aggregated so that a Gaussian approximation is valid. For ease of notation, we only discussed the Gaussian case in Sect. 11.2.7, generalizations to other distributions can be found in [8].
Secondly, convolution with the PSF of the imaging device induces blur. The first row of Fig. 11.9 shows the convolved synthetic images that were shown in the upper row of Fig. 11.2, where the images in the central row are noisy versions of these convolved images. Note, that some structures are no longer identifiable by eye after convolution. When applying the multiscale scanning approach in Scenario 5 naively Fig. 11.9 Signals after convolution (upper row), noisy version (central row) and the corresponding test results from Scenario 5, naively applied to the convolved data (lower row). Significant 5 × 5 squares are plotted in yellow. Significant 1 × 1 − 4 × 4 squares are plotted in green with increasing brightness. For each pixel, the smallest square which was found significant was plotted. Insignificant regions are coloured in blue to the convolved data (central row of Fig. 11.9). The result (lower row of Fig. 11.11) demonstrates that this is indeed not a competitive strategy and it strongly suggests to take the convolution into account.
We now briefly sketch how to adapt the multiscale scanning procedure (Scenario 5) to the convolution setting. Notice that in the case of data (11.2), we can write the test statistic (11.5) for a particular square S as where Y = (Y 1 , . . . , Y n ) denotes the data vector and I S denotes the scaled indicator function on S, i.e., I S ( j) = 1/ √ |S| if j ∈ S and 0 else. Now, the indicator functions are considered as a system of probe functions, which are tested on the data Y . In case of convolution with the PSF k (e.g. of the microscope), model (11.2) turns into (11.15) where " * " denotes convolution. The goal is to find a probe function, acting on the convolved data, denoted as I * S such that that is, I * S should locally deconvolve. Let μ μ μ = (μ 1 , . . . μ n ). Then, if F denotes the discrete Fourier transform, by Plancherel isometry and the convolution theorem This means that (provided Fk = 0) is a reasonable choice of a probe system for the data (11.15) and a statistic that adapts to the convolution is given by Scenario 5 can now be performed, following Algorithm 1 to derive suitable thresholds, replacing I S by I * S and the FWER is controlled. More precisely, it can be shown that Theorem 11.2 also applies in this scenario (see [1] for details). Figure 11.10 d shows the result of this adapted test procedure (MISCAT) applied to our original data (Fig. 11.10 a). As a comparison, we also applied Scenario 5 naively to the data set ( Fig. 11.10f). Analogously to [15], I S can be chosen such that MISCAT with I * S performs optimally in terms of detection power.

Deconvolution and scanning
In convolution problems sums of pixel values over spatial regions (e.g. squares) will be replaced by probe functionals over the pixels (weighted sums) which can be designed in an optimal way for a given convolution K . The resulting multiscale test scans over all probe functionals which results in substantially more precise segmentation results (for a direct comparison see lower left and lower right panel of Fig. 11.10). It still controls the FWER.

FDR Control
As discussed in the previous sections of this chapter, as the sample size increases (and therefore the number of tests), the control of the FWER becomes more difficult and thus this may result in low detection power, e.g., in three dimensional imaging. Therefore, a strategy to obtain less conservative procedures of error control is to relax the FWER. The most prominent relaxation is the false discovery rate (FDR [13]), defined as FDR = E #false rejections max{#all rejections, 1} , that is, the average proportion of false rejections among all rejections. Hence, in contrast to the FWER this criterion scales with the number of rejections. The control of the FDR is a weaker requirement than the control of the FWER in general. Procedures that control the FDR are often written in terms of p-values. In the situation of the Z-test with test statistics T h×h square (Y ) as in (11.12) the p-values are given as where Φ denotes the cumulative distribution function of the standard normal distribution. The smaller the p-value, the stronger the evidence that the null hypothesis should be rejected.
Benjamini-Hochberg Procedure ( [13]) Consider a multiple testing procedure consisting of independent tests with p-values p 1 , . . . , p N . Sort the p-values increasingly, p (1) ≤ p (2) . . . ≤ p (N ) , and reject all null hypotheses for which Reference [16] already proposed the above procedure but pointed out that this approach lacks a theoretical justification, which has been given by [13], who showed that FDR ≤ N 0 N α, where N 0 denotes the number of true null hypotheses.
Scenario 6 (Benjamini-Hochberg (BH) Procedure) In the situation of Scenario 3, we also performed a BH procedure for all 60 × 60 = 3600 entries of the third test image (see left panel of Fig. 11.11). The result is displayed in the right most panel of Fig. 11.11, while in the centre, for a comparison, the result of the Bonferroni procedure on the same data set is displayed. Obviously, more parts of the signal have been found, however, still several positives are missed and a false discovery is included.
There is a vast literature on FDR control and many generalizations have been proposed. For instance, if N 0 N is much smaller than 1, corresponding to the case of a non-sparse signal, the procedure controls the FDR at much smaller level than α and refined versions of the BH procedure in which N 0 /N is estimated from the data have been proposed (see, e.g., [17,18] and the references given therein).
While the BH procedure grants control of the FDR in test Scenarios 2 and 3 due to independence between pixels, the situation in Scenario 5 is more delicate due to the strong local correlations, in particular in the presence of convolution, where a suitable FDR-procedure is still an open problem and currently investigated by the authors. We stress that while FDR-control under specific dependency structures has been investigated by many authors, e.g., [19,20] and the references given therein.
Non of the existing methods provide a procedure tailored to deconvolution problems as they occur in photonic imaging. The construction of such adjusted methods is a worthwhile focus for future research.

Statistical Multiscale Estimation
If one is further interested in the recovery (estimation) of the unknown signal, the multiscale testing procedure developed in Sects. 11.2.6 and 11.2.8 actually provides a collection of feasible candidates for this task in the sense that all signals which fall in the acceptance region of one of the afore-mentioned tests can be considered as "likely" as they cannot be rejected by such a scanning test. More precisely, if we assume model (11.15), any signalμ μ μ which satisfies cannot be rejected. Here H and S(h) are defined in Theorem 11.2, and q * 1−α is the (1 − α)-quantile of the left hand side of (11.18) with (Y * −μ μ μ * k) being replaced by noise ε, w(h) is the scale correction term given in (11.10), and I * S is as in (11.16). Among all the candidatesμ μ μ lying in (11.18), we will pick the most regular estimate. This is done by means of a (convex) functional S(·), defined on a domain D for μ μ μ, which encodes prior information about the unknown signal, e.g. sparsity or smoothness. Thus, the final estimatorμ μ μ is defined aŝ μ μ μ ∈ arg miñ μ μ μ S(μ μ μ) subject toμ μ μ satisfies (11.18). (11.19) Because of the choice of q * 1−α we readily obtain the regularity guarantee i.e., the resulting estimator is at least as regular as the truth with probability 1 − α, whatever the configuration μ μ μ of the truth is. Furthermore, the remaining residuum Y * −μ μ μ * k is accepted as pure noise by the multiscale procedure described in Sect. 11.2.8. Before we discuss possible ways how to solve the minimization problem (11.19), (11.18), and hence the computation can be sped up by avoiding convolutions betweenμ μ μ and k. Next we emphasize that the discretization of (11.19) has the form arg miñ μ μ μ S(μ μ μ) subject to λ ≤ Kμ μ μ ≤λ, (11.20) where λ,λ are vectors, K a matrix, and "≤" acts element-wise. Thus, whenever S is convex, the whole problem is convex (but, however, non-smooth) and can be solved by many popular methods. In Algorithm 2 we give one possibility which arises from applying the primal-dual hybrid gradient method [21] to an equivalent reformulation of the first order optimality conditions of (11.20) (which are necessary and sufficient by convexity).
One alternative to Algorithm 2 is the alternating direction method of multipliers (ADMM), which can be applied directly to (11.20) and is compatible with any convex functional S [23]. However, Algorithm 2 avoids the projection onto the intersection of convex sets, and turns out to be much faster in practice if step 4 in Algorithm 2 can be efficiently computed. For further algorithms relevant for this problem, see Chaps. 6 and 12.
We stress that a crucial part of the estimatorμ μ μ in (11.18)- (11.19) is the choice of probe functionals I * S from Sect. 11.2.8. In Fig. 11.12, this estimatorμ μ μ is referred to as MiScan(short for multiscale image scanning), whereas MrScan(short for multiscale residual scanning) denotes the estimator of a similar form asμ μ μ but with I * S being replaced by I S see [23][24][25][26] i.e., the convolution is not explicitly taken into account in the probe functional. MiScan recovers significantly more features over a range of scales (i.e., various sizes) compared to MrScan.
There is good theoretical understanding on the estimatorμ μ μ by (11.18)- (11.19) for the regression model (11.2), that is, k = δ 0 , the Dirac delta function, in model (11.15). In case of S being Sobolev norms, [27] shows the minimax optimality ofμ μ μ for Sobolev functions for fixed smoothness, and [28] further show the optimality over Sobolev functions with varying smoothness (adaptation). In case of S being the total variation semi-norm, [29] show the minimax optimality of such an estimator for functions with bounded variation. All the results above are established for L p -risks (1 ≤ p ≤ ∞). For the more general model (11.15), [30] provide some asymptotic analysis Fig. 11.12 Comparison on a deconvolution problem (SNR = 100, and the convolution kernel k satisfying F k = 1/(1 + 0.09 · 2 )). MiScanis defined by (11.18)- (11.19); MrScanis similar to MiScanbut with I * S replaced by I S ; For both methods, the regularization functional S is chosen as the total variation semi-norm with respect to a relatively weak error measure, the Bregman divergence. A detailed analysis of MiScan exploring the probe functionals in (11.16)

in a convolution model is still open and currently investigated by the authors.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.