# Statistical Density Estimation Using Threshold Dynamics for Geometric Motion

- First Online:

- Received:
- Revised:
- Accepted:

DOI: 10.1007/s10915-012-9615-6

- Cite this article as:
- Kostić, T. & Bertozzi, A. J Sci Comput (2013) 54: 513. doi:10.1007/s10915-012-9615-6

- 2 Citations
- 516 Downloads

## Abstract

Our goal is to estimate a probability density based on discrete point data via segmentation techniques. Since point data may represent certain activities, such as crime, our method can be successfully used for detecting regions of high activity. In this work we design a binary segmentation version of the well-known Maximum Penalized Likelihood Estimation (MPLE) model, as well as a minimization algorithm based on thresholding dynamics originally proposed by Merriman et al. (The Computational Crystal Growers, pp. 73–83, 1992). We also present some computational examples, including one with actual residential burglary data from the San Fernando Valley.

### Keywords

Statistical density estimationImage segmentationThresholdingGinzburg-Landau functional## 1 Introduction

*d*(

**x**) based on point data

**x**

_{1},

**x**

_{2},…,

**x**

_{N}∈ℝ

^{n}, and they take a form

*R*(

*d*) is a penalty function. A wide variety of different penalty functions are proposed in the literature, and many of them are designed to impose additional smoothness on the density function. Single variable density estimation with a non-smooth density function is discussed in [27] and [18] where the authors of both works introduced the TV semi-norm as a penalty function. In their work [22], Mohler et al. construct the first two dimensional TV norm based MPLE:

*H*

^{1}norm as a penalty function, and they successfully integrated geographic information of the observed region to get a more accurate density estimate. The authors use city maps, census data as well as other types of geographical data to determine the region where the events typically occur, the valid region

*D*. Knowing that the density function is zero anywhere outside the valid region, the authors align the zero level sets of the valid region and the density function. In their modified MPLE model aligning is achieved through addition of the alignment term, and the model they propose is

_{D}is a characteristic function of the valid region. In the case of the weighted

*H*

^{1}MPLE model, in order to allow the density function

*d*to have sharp jumps on the border of the valid region, the

*H*

^{1}penalty functional was forced away from the boundary of the valid region. Thus, the proposed model becomes:

*z*

_{ϵ}is a continuous function such that

*w*is a function that represents the given data,

*u*is a segmentation function, and

*d*(

*u*) is a two phase density function that can be determined from

*u*. We define

*w*as a sum of Dirac

*δ*functions \(w=\sum_{i}^{N}\delta(\mathbf{x}_{i})\), where

**x**

_{i}are data points. This function is introduced for the purpose of more compact notation. Note that in Eq. (5) the maximum likelihood term ∫

*w*log(

*d*(

*u*))

*dx*involves the data function

*w*, while in Eqs. (4), (3), (2) and (1) we used the sum \(\sum_{i=1}^{N} \log(d(\mathbf {x}_{i}))\). The dense region

*Σ*is located via segmentation function

*u*, i.e.

*u*=

*χ*

_{Σ}. Using the information on the data set we obtain through function

*w*, as well as the assumption that density

*d*(

*u*) is piecewise constant, we calculate the values of

*d*(

*u*) in

*Σ*and

*Σ*

^{C}.

This paper is organized as follows, in Sect. 2 we give some background on variational methods for image segmentation, in Sect. 3 we describe the MBO scheme, in Sect. 4 we present some background on MPLE models. In Sect. 5 we discuss the proposed model in more details, and calculate the time dependent Euler-Lagrange equation to minimize the functional (5). Section 6 explains the thresholding dynamics for minimization of our energy functional. Details on the numerical implementation are presented in Sect. 7.

## 2 Background on Variational Methods in Image Segmentation and MBO Scheme

*f*is an image that should be segmented, and

*u*is a segmentation function. One of the major results in image segmentation is Chan-Vese algorithm presented in [5], where authors, inspired by the level set method by Osher and Sethian [24], propose a method for minimizing the piecewise constant version of (6):

*Σ*is a segmented region. The level set version of (7) proposed in [5]

*Σ*being represented as a 0-level set of the function

*ϕ*(

*x*):

*D*→

**R**. After the optimal values for constants

*c*

_{1}and

*c*

_{2}are determined, the time-dependent Euler-Lagrange equation for

*ϕ*is found:

*H*

_{ϵ}is an approximation of the Heaviside function, and a semi-implicit numerical scheme is created to solve (9). Due to the computational complexity of the numerical algorithm that solves Eq. (9), Gibou and Fedkiw created algorithms that do not explicitly solve (9). In their work [16], they designed the hybrid k-means level set method and applied it to images that were previously processed using the Perona-Malik diffusion. Another algorithm that successfully minimizes the piecewise constant Mumford-Shah functional without solving the gradient descent equation (9) is proposed in [30]. Numerous modifications of the piecewise constant Mumford-Shah model, and different ways to minimize it appeared in the literature (see [2, 4, 6, 9, 14, 17] and [29]).

*Γ*-convergence. This property justifies the substitution of the TV norm for the Ginzburg-Landau functional. The reason one might prefer the GL functional over the TV semi-norm is the simplicity of the minimization numerical scheme that the GL functional yields. Indeed, the

*L*

^{2}gradient descent of the GL functional gives us the Allen-Cahn equation

*O*(

*ϵ*). The Ginzburg-Landau functional has appeared in many image processing applications, such as inpainting [3, 7] and segmentation [9, 10].

*C*by studying the diffusion equation

*χ*

_{t}=Δ

*χ*, where

*χ*is the characteristic function of the set

*Σ*and

*∂Σ*=

*C*, i.e.

*C*represents a sharp front between two phases of the characteristic function. The appropriate change of coordinates suggested in [20] reveals that, when diffusion is applied, any point on the front moves with the normal velocity that is equal to the mean curvature at that point. Simultaneously, the front is radially blurred. However, the \(\chi=\frac{1}{2}\) level set is invariant to blurring. From there it follows that, the \(\chi=\frac{1}{2}\) level set yields motion by mean curvature. Based on the previous observation the following numerical scheme is proposed:

**Step 1**Let*v*(*x*)=*S*(*δt*)*u*_{n}(*x*) where*S*(*δt*) is a propagator by time*δt*of the equation:with appropriate boundary conditions.$$v_t=\Delta v $$**Step 2**Threshold$$u_{n+1}(x) = \left \{ \begin{array}{l@{\quad}l} 0 & \text{if $v(x) \in(-\infty,\frac{1}{2}]$ }\\[6pt] 1 & \text{if $v(x) \in(\frac{1}{2},\infty)$ }\\ \end{array} \right . $$

The reason we are interested in motion by mean curvature flow is due to is its close relation to the Allen-Cahn equation (10).

*f*. The first variation of the model (11) yields the following gradient descent equation:

**Step 1**Let*v*(*x*)=*S*(*δt*)*u*_{n}(*x*) where*S*(*δt*) is a propagator by time*δt*of the equation:with appropriate boundary conditions.$$w_t=\Delta w-2\tilde{\lambda} \bigl(w(c_1-f)^2+(1-w) (c_2-f)^2 \bigr) $$**Step 2**Set$$u_{n+1}(x) = \left \{ \begin{array}{l@{\quad}l} 0 & \text{if $v(x) \in(-\infty,\frac{1}{2}]$ }\\[6pt] 1 & \text{if $v(x) \in(\frac{1}{2},\infty)$ }\\ \end{array} \right . $$

*δt*timestep in the step \(\mathbf{1.}\) of the algorithm. If

*δt*is chosen too large comparing to the parameter

*λ*

^{−1}, the interface tend to be overly smooth. As the advantage of choosing larger values for

*δt*they mention faster convergence. The segmentation could also benefit from the larger values for the parameter

*λ*, as there is less penalty on high curvature of the interface in this case. The size of the spatial resolution has to be taken into account when the value for

*δt*is chosen, as the values much smaller than the spatial resolution could lead to the stillness of interface. These observations can also be used as guidance in parameter selection for our algorithm. The authors also show that MBO thresholding type methods for binary segmentation can easily be generalized to multi-phase segmentation methods. To accomplish that, the authors propose the four phase model based on the modified version of (11) with the sum of the two Ginzburg-Landau functionals built around two different segmentation functions,

*u*

_{1}and

*u*

_{2}. In this case, the fidelity term naturally depends on both

*u*

_{1}and

*u*

_{2}. After they find the gradient descent equations with respect to

*u*

_{1}and

*u*

_{2}, they construct the thresholding numerical scheme to solve the obtained system of parabolic equations. We will adapt ideas by Esedoglu and Tsai to solve our MPLE problem and illustrate the usefulness of this simple method.Some extension of the MBO algorithms appeared in [11, 12, 21] An efficient algorithm for motion by mean curvature using adaptive grids was proposed in [26].

## 3 MPLE Methods and Proposed Model

*n*points

**x**

_{1},

**x**

_{2},…,

**x**

_{n}and is a sample of

*n*independent random variables with common density

*d*

_{0}. Maximum likelihood estimation is a standard method for estimating a density function

*d*

_{0}based on given data. In the case the of the parametric model, we know that

*d*

_{0}belongs to the family of density functions

*D*={

*d*(⋅,

*θ*):

*θ*∈

*Θ*}, and our goal is to find a parameter

*θ*

_{0}such that

*d*(⋅,

*θ*

_{0})=

*d*

_{0}(⋅). This class of problems are known as parametric density estimates. In 1922, According to Fisher’s model [13], an optimal parameter

*θ*

_{0}∈

*Θ*(where

*Θ*is a set of all parameters) satisfies the following:

*d*

_{0}belongs to may be unknown, in which case we are dealing with a nonparametric density estimate. The analog of the model (13) is an ill-posed problem, i.e. finding a probability density function

*d*

_{0}such that

*R*(

*d*) was introduced, and this method is known as maximum penalized likelihood estimation:

*R*(

*d*)=∫

_{Ω}|∇

^{3}log

*d*|

^{2}from [8]. These, and many other standard penalty functional enforce smoothness on density function, but do not perform well when the density function has sharp gradients, i.e. is piecewise constant. To resolve this issue, Koenker and Mizera in [18] as well as Sardy and Tseng in [27] propose the penalty functional to be the TV semi-norm. This approach was also successfully used in [28] and [22]. In our work, since we assume the density is a step function, choosing a penalty functional that can successfully handle sharp gradients is crucial. As previously mentioned , instead of the TV semi-norm, we chose the Ginzburg-Landau functional to be the penalty functional.

### 3.1 General Model

For now are going to focus on the segmentation function *u*. We assume our segmentation function is the characteristic function of the region *Σ*, where *Σ* is an area with a larger density. For any given data and any given segmentation function there is a unique density function corresponding to them. With *w* being the function that approximates the data, the total number of events is approximately equal to ∫*w*, while the number of events inside and the number of events outside of the region *Σ* are approximated by ∫*wu* and ∫*w*(1−*u*), respectively. According to that, the density *c*_{1}(*u*) inside the region *Σ* is equal to \(\frac{\int wu}{\int u \int w}\) and the density *c*_{2}(*u*) in the region *Σ*^{C} is equal to \(\frac{\int w(1-u)}{\int w \int(1-u)}\). Finally, we write the density function as *c*_{1}(*u*)*u*+*c*_{2}(*u*)(1−*u*). The established correspondence between the segmentation and the density function suggests that building a diffuse interface MPLE model around the segmentation function is possible. As the segmentation function takes only 0 and 1 values, the Ginzburg-Landau functional is a natural choice. As the density is a rescaled segmentation function, using the Ginzburg-Landau functional for *u*, as opposed to the Ginzburg-Landau functional for the density seems both reasonable and convenient.

*w*represents the given data,

*W*(

*u*)=

*u*

^{2}(1−

*u*)

^{2}and

*μ*is a parameter. As we already mentioned, the Ginzburg-Landau functional converges to the TV norm in the sense of

*Γ*convergence, as

*ϵ*→0

^{+}. As a consequence, the diffuse interface model we propose here converges to the TV-based MPLE that was used in [22, 28], when

*ϵ*→0

^{+}. Now, variation of energy of (16) gives us the following cases for the

*L*

^{2}gradient descent equation:

- If both and
*c*_{1}(*u*) and*c*_{2}(*u*) (further we use*c*_{1}and*c*_{2}instead for simplicity of the notation) are non-zero:$$ u_t=2\epsilon\Delta u-\frac{1}{\epsilon}W'(u)+\mu w \biggl[\frac{c_1-c_2}{c_1}u+\frac{c_1-c_2}{c_2}(1-u)+\biggl(\frac{\int {(1-u)w}}{\int{1-u}}- \frac{\int{u w}}{\int{u}}\biggr)\biggr]. $$(17) - If
*c*_{1}is equal to zero:$$ u_t=2\epsilon\Delta u-\frac{1}{\epsilon}W'(u)+\mu\biggl[ w(u-1)+\biggl(\frac{\int{(1-u)w}}{\int{1-u}}-w\biggr)\biggr]. $$(18) - If
*c*_{2}is equal to zero$$ u_t=2\epsilon\Delta u-\frac{1}{\epsilon}W'(u)+\mu\biggl[ wu+\biggl(w-\frac{\int{u w}}{\int{u}}\biggr)\biggr]. $$(19)

### 3.2 Special Case Model

*Σ*

^{C}is zero, so the density function is just a rescaled segmentation function. Thus, replacing the density function

*c*

_{1}(

*u*)

*u*+

*c*

_{2}(

*u*)(1−

*u*) in the MPLE term of our general model by the segmentation

*u*seems reasonable. Now, the model we propose is:

*w*is a function that approximates the data, in order to make everything well-defined we introduce the model with a small constant

*ν*:

## 4 Proposed Dynamics

*A*(

*u*(⋅,

*t*)) and

*B*(

*u*(⋅,

*t*)) being non-linear functions. Each of the gradient descent equations, (17), (18), (19) and (22) can be given in the form (23), where

*A*(

*u*(⋅,

*t*)) and

*B*(

*u*(⋅,

*t*)) take different vales in different cases. Motivated by the MBO scheme for solving the Allen-Cahn equation, we propose a thresholding scheme to approximate the solution of Eq. (23). Esedoglu and Tsai used a similar approach to minimize the Mumford-Shah segmentation functional in [9]. The first step we need to take toward generating a thresholding scheme is finding a good way to split Eq. (23) into two steps analogous to those proposed in the MBO scheme. In that regard, finding a way that successfully deals with the non-linear forcing term of Eq. (23) is critical. Inspired by the algorithm presented in [9] we propose the following dynamics:

**Step 1.**Let*v*(*x*)=*S*(*δt*)*u*_{n}(*x*) where*S*(*δt*) is a propagator by time*δt*of the equation:with appropriate boundary conditions.$$y_t=\Delta y-A\bigl(y(\cdot,t)\bigr)y+B\bigl(y(\cdot,t)\bigr) $$**Step 2.**Set$$u_{n+1}(x) = \left \{ \begin{array}{l@{\quad}l} 0 & \text{if $v(x) \in(-\infty,\frac{1}{2}]$ }\\[6pt] 1 & \text{if $v(x) \in(\frac{1}{2},\infty)$ }\\ \end{array} \right . $$

## 5 Numerical Implementation

*u*

_{n}being an initial condition. To generate the numerical results we denote the timestep by

*δτ*is a timestep, and approximate the Laplacian by its five-point stencil, which gives us the following scheme:

*δt*. After the propagation phase, a thresholding step is necessary to complete the iteration:

*l*is a total number of iterations we made in the propagation phase. The small relative change of the

*L*

^{2}norm between two consecutive iterations was used as a stopping criterion.

In this implementation, the data function *w* is used as an initial condition, along with Dirichlet or Neumann boundary conditions.

### 5.1 Adaptive Timestepping

The choice of timestep in the propagation phase, a “sub-timestep”, can be chosen to optimize performance. In the early stage of computation, it is important to keep the sub-timestep small in order to obtain a good estimate in the propagation phase. However, as our algorithm is approaching steady state, a large number of iterations in the propagation phase pose a burden on the computational time. To successfully speed up the convergence of our algorithm, we used adaptive timestepping, a modified form of the scheme proposed in [1].

*t*

^{n}uses solution at three consecutive timesteps

*t*

^{n−1},

*t*

^{n}and

*t*

^{n+1}. Let us define \(e^{n+1}=\frac {(v^{n+1}-v^{n})}{v^{n}}\) and \(e^{n}=\frac{(v^{n}-v^{n-1})}{v^{n}}\), as well as Δ

*t*

_{old}=

*t*

^{n}−

*t*

^{n−1}. The previous definitions allow us to define a dimensionless estimate of the local truncation error:

We used adaptive timestepping at two different levels, in the propagation phase of the algorithm we adapt the sub-timestep, as well as adapting an initial sub-timestep for the future iterations. In the propagation phase of any iteration, we calculate a dimensionless truncation error estimate for different propagation times. Once an error is smaller than a given tolerance *Tol*_{1} for a certain number of the consecutive iterations, we increase the timestep by 10 %. We also estimate the dimensionless error in every iteration of the algorithm, and if we find an error to be smaller than *Tol*_{2} the initial sub-timestep in the propagation phase of the next iteration will be increased be 10 %. However, we never allow the initial sub-timestep to be larger than \(\frac{1}{8}\) of the timestep. Notice that we are not adapting the timestep, the total propagation time in each iteration is the same.

### 5.2 Adaptive Resolution

Another way to improve the computational time is to use adaptive resolution. As we mentioned before, we use the data function *w* as an initial condition when solving the equation (23). It is reasonable to assume that the more the initial condition “resembles” the solution, the less iterations the algorithm would take to obtain the solution. The main idea is to generate a lower resolution form of the data set, then use a low resolution solution to create a good initial guess for the high resolution solution. Providing a good initial guess for the higher resolution problem is particularly useful as the iterations when the algorithm is applied to the higher resolution versions of the data set tend to be slower. In this implementation, we typically applied this procedure several times on some sparse data sets. At each step we create the coarser form of the given data set, until we reach the version of the data set that has a satisfying density. Our experiments show that data sets with the total density between 0.05 and 0.2 are optimal for this algorithm. Once a sufficiently dense low resolution version of the data set is obtained, we run our algorithm to get the low resolution solution, and start working our way up from there. The higher resolution approximation of the solution is then generated, and used as an initial condition in the next step. In the next step, we are solving the problem on the data set that has a higher resolution. It is important to mention that this process does not alter the original data set. We call this process *n-step adaptive resolution* where *n* is the total number of times we reduced the resolution of the original data set. The number of steps, *n*, is closely related to our choice of timestep. In case we are segmenting the region of higher density in our data, we noticed, through multiple experiments, that the timestep often can be given as *ω*2^{n}, where *n* is the number of levels in adaptive resolution, and *ω*∈[0.15,0.2]. In case we are locating the valid region, we usually allow a smaller timestep, but also a larger number of levels in adaptive resolution. However, starting with a problem that has a significantly lower resolution comparing to the original one, we might run into some problems. Decreasing resolution significantly may result in a very different looking data set, thus segmentation would not perform in an expected way, i.e. this first initial guess would not be a good approximation of the solution we are trying to find.

## 6 Computational Examples

### 6.1 Test Shapes

### 6.2 Orange County Coastline

*μ*can lead to segmentation of the dense regions at different levels. All images in this section have resolution of 600×1000 pixels.

### 6.3 San Fernando Valley Residential Burglary Data

*Σ*, and our goal is to segment the region. The region

*Σ*would represent a valid region of the given data set. In absence of geographic data that describe the location of the valid region, an accurate estimate of it can dramatically improve accuracy of the density estimation, see [28] and [22]. In the following example, our goal was to, without using any spatial information, segment the valid region from the San Fernando Valley residential burglary data. The events in Fig. 12(a) represent locations where burglaries took place during 2004 and 2005. The contour of our valid region estimate obtained by applying the special case model is also shown in Fig. 12(a). Smith et al. in [28] performed the valid region estimate using census and other types of data to locate the residential area in the region of interest, and their result is Fig. 12(b). They incorporated the valid region estimate from Fig. 12(b) in their Weighted

*H*

^{1}MPLE model to obtain the density estimate results from Fig. 12(c). The TV MPLE algorithm developed by Mohler et al. in [22], was used to generate the density estimate in Fig. 12(b). This method did not use any additional spatial information to locate the valid region.

## 7 *V*-Fold Cross Validation

*μ*and the timestep can affect the performance of this algorithm. Our experiments show that in case we are segmenting the high density region the optimal value of the parameter

*μ*is not larger than 0.2. When our goal is to estimate the valid region, we typically assign larger values to the parameter

*μ*. To estimate the value of the smoothing parameter we implemented a version of the

*V*-fold cross validation algorithm. In their work [27] Sardy and Tseng proposed the

*V*-fold cross validation based on the Kullback-Leibler information. In the

*V*-fold cross validation, the original data set is partitioned into

*V*disjoint subsets

**x**

_{v}={

*x*

_{i},

*i*∈

*S*

_{v}} where

*S*

_{v}consists of all indexes of data points from the partition

*v*=1,…,

*V*. Set

**x**

_{−v}={

*x*

_{i},

*i*∉

*S*

_{v}} is used as a

*training set*, i.e. the algorithm is applied on

**x**

_{−v}with some particular value \(\hat{\mu}\), and the density \(\hat{d}_{\hat{\mu},-v}\) is estimated. Set

**x**

_{v}is a

*validating set*, which means that \(\{ \hat{d}_{\hat{\mu},-v}(x_{i})\}_{i\in S_{v}}\) is used to estimate the density on

**x**

_{v}. Following these observations, the authors of [27] proposed the following estimate of Kullback-Leibler information \(CV(\hat{\mu})=-\sum_{v=1}^{V}\sum_{i\in S_{v}}\hat{d}_{\hat{\mu },-v}(x_{i})\), and after the search of the set of parameters \(\hat{\mu}\) the one that minimizes this quantity is selected. However, \(CV(\hat{\mu})\) uses only the log-likelihood to predict the performance of the model for some value \(\hat{\mu}\) of the smoothing parameter, but does not take the

*H*

^{1}norm of the segmentation function of the estimated density into account. We denote the segmentation function that corresponds to the density \(\hat{d}_{\hat{\mu},-v}\) by \(\hat{u}_{\hat{\mu},-v}\). Since the segmentation function is a binary function, the discrete

*H*

^{1}and the discrete

*TV*norm are equivalent, thus the

*H*

^{1}norm also measures the length of the front between two phases. In some applications, such as the case when the dense and the surrounding sparse region have similar densities, the values of

*CV*(

*μ*) for the different values of

*μ*tend to be similar. It is useful in those cases to also measure the

*H*

^{1}norm of the obtained segmentation \(\hat{u}_{\hat{\mu},-v}\), and incorporate that information in the

*V*-fold cross validation. Because of that, we propose a slightly different technique, where we evaluate \(CV_{H^{1}}(\hat{\mu}) =-\sum_{v=1}^{V}(\sum_{i\in S_{v}}\hat{d}_{\hat{\mu},-v}(x_{i})-\xi\int{|\nabla\hat{u}_{\hat{\mu },-v}|^{2}})\) for each value of \(\hat{\mu}\) from some proposed set of parameters, and select the value that minimizes it. The results do not appear to be very sensitive to

*ξ*, we used small values, comparable to those of \(\hat{\mu}\). The evaluation of \(CV_{H^{1}}(\mu)\) for a single value of parameter

*μ*requires

*V*different density estimates, which could cause the

*V*-fold cross validation to be very computationally intense. However, all density estimates that have to be performed are independent of each other, which makes the

*V*-fold cross validation a perfect candidate for parallelization. In this implementation, we used 10-fold cross validation, and the process of calculating \(CV_{H^{1}}\) is parallelized using 10 threads, which reduces the computational time of one evaluation of \(CV_{H^{1}}\) down to the computational time needed for one density estimate. The computational time one density estimate takes varies from 0.2 s in the small scale examples (100×100 pixels) to around one minute in the large scale examples (600×1000 pixels). To demonstrate the performance of the proposed algorithm, Figs. 13 and 14 show some computational examples with segmentations generated using a model with the smoothing parameter obtained through 10-fold cross validation. To find the parameter

*μ*we performed the linear search of intervals.

## 8 Conclusion

This work demonstrates that threshold dynamics methods for image segmentation are a powerful tool for statistical density estimation for problems involving two dimensional geographic information. The efficiency of the method, especially when combined with multi-resolution techniques makes this a practical choice for parameter estimation involving *V*-fold cross validation, especially when parallel platforms are available. The method is a binary segmentation method that also determines density values. However, it can be naturally generalized to multi-level segmentation. One way to achieve that may include representing the segmentation function as a linear combination of the multiple binary components, similarly to the idea used for generalizing binary to grayscale inpainting in [7]. However, this requires sufficient data to warrant a multi-level segmentation.

## Acknowledgements

This paper is dedicated to Prof. Stanley Osher on the occasion of his 70th Birthday. We would like to thank Laura Smith and George Mohler for their helpful comments. This work was supported by ONR grant N000141210040, ONR grant N000141010221, AFOSR MURI grant FA9550-10-1-0569, NSF grant DMS-0968309 and ARO grant W911NF1010472, reporting number 58344-MA.

### Open Access

This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.