Scale dependence of distributions of hotspots

We consider a random field $\phi(\mathbf{r})$ in $d$ dimensions which is largely concentrated around small `hotspots', with `weights', $w_i$. These weights may have a very broad distribution, such that their mean does not exist, or else is not a useful estimate. In such cases, the median $\overline W$ of the total weight $W$ in a region of size $R$ is an informative characterisation of the weights. We define the function $F$ by $\ln \overline W=F(\ln R)$. If $F'(x)>d$, the distribution of hotspots is dominated by the largest weights. In the case where $F'(x)-d$ approaches a constant positive value when $R\to \infty$, the hotspots distribution has a type of scale-invariance which is different from that of fractal sets, and which we term \emph{ultradimensional}. The form of the function $F(x)$ is determined for a model of diffusion in a random potential.


Introduction
In many cases two-dimensional scalar fields are largely supported on small areas, 'hotspots'.Examples can include the distribution of human populations, which are concentrated in urban settlements, the distribution of debris on the ocean, which can be concentrated in regions where cool or saline water is subducted, and deposits of mineral ores, which can be concentrated at the points where dissolved material is deposited from evaporating water.Another example is images of star fields, where the stars appear as points.Subjectively, images of the distribution of hotspots can appear to bear a familial resemblance.This paper addresses the question of how these distributions can be characterised, and whether they have scale-invariant features.
The fields that we consider can be modelled by random processes.We consider random, non-negative scalar fields in a two-dimensional space, denoted by ϕ(r), with statistics which are homogeneous (translationally invariant) and isotropic (rotationally invariant).Extensions to higher dimensions will be obvious.
In the cases where the field is highly concentrated in the vicinity of isolated points, we can consider the following simple model.We take a uniform, independent random scatter of points on the plane, r i , with density ρ.Each point is assigned a random weight w i , drawn independently from a distribution with probability density function (PDF) p(w).The weights w i represent the integral of ϕ(r) in the neighbourhood surrounding one of the points upon which it is concentrated.
The primary interest will be in the cases where p(w) has a very broad distribution.Accordingly, we introduce a power-law model, such that for large w, the PDF is p(w) ∼ w −γ .In the calculations below we shall use the following specific distribution as an example: with 1 < γ < 2, so that the distribution is normalisable, but its mean is undefined.We shall also need to consider the cumulative distribution: if P (w 0 ) is the probability that w > w 0 , then (1) implies that P (w) = w −(γ−1) for w > 1.It will be argued that this is a foundational model for the distribution of hotspots, and that distributions obtained from more general models can be approximated using this model, with a suitable choice of the parameter γ.
Figure 1 is an illustration of 12 different realisations of this model for hotspot distributions, with γ = 5/3, plotted on four different lengthscales.It will be argued that the statistics of these images has a scale-invariance property, in that it is impossible to identify the scale factors of the panels.Non-trivial scale invariance is usually associated with fractal [1,2] (or more generally, multifractal [3,4]) properties, which can usually be characterised by saying that the set is, in some sense, self-similar under a change of scale.The images in figure 1 are so diverse that would require a large number of realisations to demonstrate that they are drawn from the same ensemble.We shall argue below that there is a simple quantitative distinction between the scale invariance of figure 1 and that of fractal sets.
There are two complementary aspects to characterising these sets.Firstly, we might wish to know how the total weight of the hotspots increases with the size of the region.Consider the set W R of approximately ρR 2 values of w i for which r i lies inside a square of side R. The set has the total weight If the weights had a compact distribution, we would estimate the mean value of W as ⟨W ⟩ = ρR 2 ⟨w⟩, but for the distribution (1), ⟨w⟩ is infinite (⟨X⟩ denotes the expectation value of X throughout).A more promising approach is to estimate the median value W .This is considered in Section 2 below.We anticipate that W will increase very rapidly as a function of the scale length R. Accordingly, we use a logarithmic scale.We can characterise a given hotspot distribution by means of a function F : For the simple model described above, we show that W has a power law dependence upon R, so F (x) is a linear function.The exponent of this power law can be thought of as a type of dimension of the set of hotspots, and even if F is not a linear function we can define an effective dimension D eff at the length scale R as the derivative For our simplified model it will be shown that, for points distributed randomly in d dimensions with the weight distribution (1), Note that, because 2 > γ > 1, this effective dimension is higher than the dimension of the embedding space.This indicates that the effective dimension is different from a fractal dimension.We describe this scale invariance as ultradimensional.
A second aspect of describing the hotspot distribution is to consider the relative sizes of the largest values of w i in the set W R .We can transform this to a filtered and normalised set, W R , as follows.We scale the hotspot positions by dividing by R, and plot r i /R inside a unit square.We eliminate the values of w i below a chosen threshold, for example, those w i that are less than ϵW , where ϵ is a given small positive number.We can also 'normalise' these sets by dividing every remaining w i by W .These normalised and filtered sets are a natural representation of many types of pointset data.An example is a geographical map showing settlements using symbols with the sizes relative to of the largest settlements in the mapped region, where settlements below a certain size are not shown in order to eliminate clutter.Another example is a photograph of the night sky with the exposure adjusted so that the image saturation is normalised, and stars below a certain intensity are not registered at all.The filtered and normalised sets can be characterised by considering the relative sizes of the largest values of w i .To this end, we can sort the weights, w i , into a decreasing sequence { − → w i }, and consider the proportion of the total mass which is contained in the first k elements of this set where N is the number of elements in the filtered set.We can consider the average of f k over different regions of the data, and in some cases we can also average over multiple realisations of the distribution.For the model defined by equation ( 1), this leads to a family of functions of γ: We shall make a hypothesis that, for a general model, the set of values of f k at length scale R is representative of the model ( 1), with an effective value of γ given by rearrangement of ( 5): If the derivative of the function F (x) defined by equation ( 3) approaches a constant as x → ∞, this is indicative of the sets W R having scale-invariant properties, such that the statistics of W R and W λR are indistinguishable, for a wide range of values of the positive number λ.This idea can be expressed by saying that the realisation of W R are drawn from an ensemble which is independent of R, depending only upon γ eff .In the case of the power law model, this scale invariance is manifest.This self-similarity could be trivial, or it could indicate that the hotspot distribution has fractal properties, or something different.It will be argued that it is the latter possibility which is realised.Figure 1 is an example of sets W R generated by this model.displayed on four different length scales (we used ρ = 1, γ = 5/3, and R = 80, 400, 2000, 10000.We show 12 sample sets, with three at each of the different scale factors.The twelve images look so different from each other that it is not evident that they are drawn from the same ensemble.The different values of the scale factor R were randomly assigned (the key is in the figure caption), and its value cannot be determined by inspection of an individual realisation.Despite the fact that the length scales vary by a factor of 125, it is not possible to distinguish which of these images corresponds to which value of R.
The results in section 2 will quantify the non-trivial scale-invariance of the powerlaw model under a change of the magnification of the image.We discuss the statistics of W for this simple model, leading to the relation ( 5) between the exponent γ and the dimension D eff .
In section 3 we discuss a physical example of a hotspot distribution, namely the probability density for a particle diffusing in a two-dimensional gaussian random potential, V (x, y).The equilibrium probability density is proportional to exp[−V (x, y)/D] where D is the diffusion coefficient.In the limit where the diffusion coefficient approaches zero, this density is concentrated at 'hotspots' which are minima of the potential function, with the weights w ≈ exp[−V min /D], where V min is the value of the local minimum of the potential at the hotspot.We identity the functions F (ln R) and γ eff (R) for this model, including their dependence upon the diffusion coefficient D. Section 4 is a brief conclusion.

Statistics of a simple model
Consider how the statistics of the total weight W depends upon R for the power-law model, with weight distribution (1).The mean value of w is undefined, so calculating the expectation value ⟨W ⟩ is not a good approach.The mean value is dominated by rare realisations where one or more of the w i takes a very large value.Estimating the median of W , which will be denoted by W , appears to be more promising.If ŵ is the largest of the N ∼ ρR 2 samples of w in the square, then we might hypothesise that W is approximated by ŵ, that is of the median of the largest value in the sample.Here it will be argued that this multiplier W / ŵ is independent of both R and ϵ.
It is easy to calculate w * ≡ ŵ.The probability that none of the N independent values of w exceeds ŵ is [1 − P ( ŵ)] N , so that w * = ŵ satisfies [1 − P (w * )] N = 1/2.This gives .
Next we estimate the number of points N in the filtered set, and the value of W .The number of values of w i in the range from w * ≡ ŵ (upper limit) to ϵw * (lower limit) is ∼ ln 2 ϵ −(γ−1) (10) so that the number of points N in the filtered set is independent of R, although it does depend upon ϵ.
The median of the sum W of a large number of values of w i is estimated by noting that W = ŵ + W , where ŵ is the largest of the w i , and W is the sum excluding the largest of the w i .The value of W will be approximated by its mean value, which depends upon ŵ.Writing ŵ = aw * , and taking the leading order as N → ∞, ϵ → 0 This gives the following estimate for W , in terms of a = ŵ/w * : The value of W depends upon a random quantity, a. following estimate for W : .
This indicates that W exceeds the median of the largest term by a factor which is independent of both ϵ and N (and which is therefore therefore independent of R).The independence of W /w * upon R indicates that the filtered images are scale-invariant.The fact that this ratio does not depend upon ϵ reflects the fact that the images are dominated by the largest values of w i .Equation (13) implies that the number of w i , including the largest one, that make a significant contribution to W is 1 + γ−1 2−γ ln 2. When γ → 1, there is likely to be only one w i that dominates the filtered image.This is in accord with the large jump principle, discussed in [5].
The prediction for W , equation (13), was tested numerically.Figure 2 shows the ratio of the empirically determined values of w * and W to the theoretical estimates, equations ( 9) and (13), for N = 1000 with M = 10 4 realisations.This verifies equation (9), and shows that the N -dependence of W is the same as that of w * .The values of W used to create figure 2 span many decades: theoretical values of W (with N = 1000) range from 1.58 . ..×10 63 for γ = 1.05 to 2.99 . ..×10 4 at γ = 1.95.Given this very wide range of values, figure 2 demonstrates that equation ( 13) is a useful approximation.
Figure 3 shows the expectation value of the fraction f k of the contribution to W from the largest k samples, as defined by (6). Figure 3 Fig. 3 Mean values of fraction of the sum W contained in its largest k elements (defined in equation ( 6)), as a function of γ.The number of elements of the sum was N = 1000, and there were M = 10 4 realisations.
function of γ.This verifies that, in a typical realisation, most of the contribution to W comes from a small number of the largest w i .The fractional contribution approaches unity, in accord with the large jump principle [5], as γ → 1 We remark that there is a further level of self-similarity in our power law model, which is concerned with varying the exponent γ.Because equation (1) implies that y = (γ − 1) ln w has a PDF proportional to exp(y), the ensembles for different values of γ are equivalent, if we replace w by (γ − 1) ln w.
Our most general conclusion from this calculation follows from equation (13).When extended to d dimensions, we infer that so that the apparent dimension D eff which characterises the scale-invariance is given by equation (5).Note that D eff > d.We say that this scale-invariance is ultradimensional.
It is clearly distinguished from the self-similarity of fractal sets, where the dimension D satisfies D < d.
3 Diffusion model

Defining the model
We now consider a physically motivated example of a distribution of hotspots: the equilibrium probability density for diffusion in a random potential, V (x).This example will exhibit an approximate, rather than exact, scale-invariance.Motion of a particle is determined by a stochastic differential equation: where µ is the mobility and δη i (t) are white noise signals, independent at each timestep, satisfying ⟨δη i ⟩ = 0 and ⟨δη i δη j ⟩ = δ ij δt.In the following we set µ = 1 throughout.When V = const, the motion is simple diffusion with the diffusion coefficient D. The equilibrium probability density function for the stochastic process ( 15) is where Z is the partition function.We shall assume that motion is confined to a finite but very large region (which we take to be a square with the side R).When D is small, this density is very strongly concentrated in minima of the potential V (x).Our aim will be to characterise the function F , defined by equation ( 3), for this model.Consider the equilibrium measure when the potential V (x) is itself a smoothly varying random function, with a Gaussian PDF, and statistics which are homogeneous and isotropic.We shall assume that V (x, y) has the following statistical properties: where V x = ∂V /∂x, etc.These requirements can be satisfied by re-scaling the coordinates and the potential.Also define c by writing This parameter satisfies c ≥ 1/2, with the lower limit realised if the spectral function S(k) of V (x) (the modulus squared of the Fourier transform of its autocorrelation) has a ring spectrum, S(k) ∝ δ(k − k 0 ).If the correlation function of V is a Gaussian, then c = 1.
In general, the value of Z depends upon the realisation of the potential V (x), but if the scale size R of the region is sufficiently large, we can use ergodicity and approximate Z by its expectation value, ⟨Z⟩.The partition function is then approximated by Fig. 4 Equilibrium probability density for diffusion in a two-dimensional random potential, as specified by ( 17) and ( 18) with a Gaussian correlation function, , so that c = 1 in (18).The presentation is the same as in figure 1 When D is sufficiently small that the measure ( 16) is concentrated at the minima of V (x), the weight of a hotspot is approximated by where V * is the height of the minimum, and ∆ = V xx V yy − V 2 xy is the determinant of the Hessian matrix at the minimum.
Figure 4 illustrates the distribution of the weights of the hotspots of this diffusion model, using the same presentation as figure 1 (hotspots are represented by a filled circle with the area proportional to its weight, equation (20), and the total area of circles is normalised to 1%).We used two different diffusion coefficients D and lengthscales R. The distributions are qualitatively similar to those of the simplified model, shown in figure 1.
When D is small, the weights of the hotspots have a very broad distribution.The expectation value ⟨w⟩ is dominated by extremely rare events, which are unlikely to be realised, and it is more useful to estimate the median W of the total weight inside a region of area R 2 .The growth of W as a function of R is characterised by calculating the function F defined by equation ( 3): ln W = F (ln R).It will be argued that, for this model, the large-jump principle [5] is applicable, so that W is well approximated by the median of its largest contributor, denoted by w * .
We shall consider the following scenario.The potential V (x) is evaluated, and the weights (20) calculated, in a region of size R.While R is assumed to be large, we assume that D is sufficiently small that Z ≪ ⟨Z⟩, so that the largest weight is ŵ ≈ 1.This implies that when we estimate W (R), our estimate should satisfy W (R) = 1.We assume that the density of minima of V (x) is ρ.According to equation (20), a large value of w is associated with a minimum of the potential V , which has an approximate depth V ≈ −D[ln w + ln Z], and we find it convenient to use a variable instead of w, because the distribution of weights has a narrow support when expressed in terms of V .The largest values of w are observed very rarely, so we shall characterise the density of hotspots with very large values of w as follows: the probability P ( V 0 ) that V is less than V 0 is written in the form (In order to unambiguously normalise this distribution we regard any minimum of V (x) as being a hotspot).The function J(V ) corresponds to a 'rate function' or 'entropy function' of large deviation theory [6].We can then estimate the median of the smallest value of V , denoted by V * , by writing 1/2 = [1 − P ( V * )] ρR 2 , where ρ is the density of minima.This yields: If the inverse function of J is K (that is K(J(V )) = V ), then the required relation between w * = W and ln R is In order to use this expression to determine the function F (ln R) which appears in equation ( 3), we must determine the large-deviation rate function J(V ) which was introduced in equation ( 22).Note that, according to equations ( 20) and ( 21), so that, in the limit as D → 0, V → V , and it is sufficient for our purposes to determine the PDF of the heights of minima of the function V (x, y).We can, therefore, use the cumulative probability of the heights of local minima as the function P in equation ( 22).Equations ( 8) and (24) imply that so that γ eff ∼ 1 when D → 0. This observation justifies the claim that Ŵ ∼ w * .

Distribution of weights
We now turn to evaluating the distribution of heights of minima.The two-dimensional case is quite technical, so we shall start by discussing the estimate of W (R) in one dimension.
Here we require the density of local minima, ρ, and the probability P (V ) that the height of a local minimum is less than V .These are readily obtained using the approach developed by Rice [7], following pioneering work by Kac [8].The density of minima is where P (V, V ′ , V ′′ ) is the joint PDF of V and its first two derivatives, evaluated at the same point.We consider the case where V (x) is Gaussian, with correlation function We find the following non-zero statistics of the potential and its derivatives at a given point: Using the standard formula for multivariate Gaussian distribution, we find of the potential were identified, together with the values of (x i ) and V ′′ (x i ).For each realisation, we divided the interval into sub-intervals, halving the length each time, for 18 generations.At generation k = 1,. . .,18, for each of the M × 2 k−1 sub-intervals of length R = L/2 k−1 , we sum the weights w to determine the total weight W of each sub-interval.We then determined the median values, W , of these M × 2 k−1 weights.In figure 7 we plot the resulting 18 values of ln W as a function of ln R, for several different values of the diffusion coefficient D.
Because the values of R and D were chosen so that the largest weights w i were of order unity, equation (24) simplifies to ln W where ρ is the density of minima ((30) in one dimension, (35) in two dimensions).Figure 8 verifies this expression by showing a collapse of the data in figure 7 onto the inverse function of the large-deviation entropy, K(x) in the one-dimensional case.
We generated M = 4 realisations of V (x, y) on a square of size R = 256, with toroidal boundary conditions, by convoluting a discrete representation of white noise with a Gaussian kernel.Figure 9 displays plots of ln W as a function of ln R for the two-dimensional Gaussian potential, with different values of the diffusion coefficient D, for R = 2, 4, . . ., 256. Figure 10 illustrates the collapse of these data onto a plot of K(x), the inverse of the large-deviation rate function J(x).The range of values of R is much smaller than that shown in figures 7 and 8, because the two-dimensional simulations are more numerically demanding.

Concluding remarks
Images which show the distribution of 'hotspots', where a field has an unusually high intensity, appear to have a family resemblance, which may not be strongly dependent upon the size of the sample region.The distribution of weights of hotspots was characterised by considering the median W of the total weight in a region of size R, and defining a function F (x) by writing ln W = F (ln R) (equation ( 3)).The derivative of F is an effective dimension, D eff = F ′ (ln R).
We investigated two models for hotspot distributions.Firstly, we considered a oneparameter family of weight distributions, defined by (1), which were contrived to be scale-invariant.The scale-invariance of these models is characterised by an effective dimension D eff = d/(γ − 1), where γ ∈ (1, 2) is the parameter in the definition of the model.Because this dimension is greater than that of the embedding space, the scale invariance is distinct from the self-similarity which characterises fractal sets.Examples of realisations of this model are shown in figure 1.While it is a mathematical fact that the individual images are drawn from the same ensemble, the realisations do look very different from each other.
We also considered a physically motivated example, namely the equilibrium distribution for diffusion in a random potential.Here the realisations of the hotspot distribution, illustrated in figure 4, are qualitatively similar to those of the simplified model.We were able to determine the function F (x) for this model.Because it is not a linear function, this system does not exhibit strict scale invariance.37).Note that the assumptions behind asymptotics (37) break at ln R < 2.5, but the data still collapse even at smaller R.

Fig. 1
Fig. 1 Illustration of scale-independence of the 'hotspots' model.At the position of each hotspot there is a filled circle with area proportional to its weight.The total area of the circles in each image is normalised to be 1% of the total area of the image.The images use the probability distribution (1) with γ = 5/3, and the scale factors are R = 10000, R = 2000, R = 400, R = 80, (with three cases of each scale factor).The different images cannot be associated with the different values of R by any statistical test, reflecting the scale-invariance property.The scale factors are: R = 10000, panels (a), (g), (k), R = 2000, panels (b), (f), (h), R = 400, panels (d), (e), (l), R = 80, panels (c), (i), (j).

Fig. 2
Fig. 2 Plot of ratios of values of w * (median of the largest element) and W (median of sum of N samples) obtained from simulation, divided by their theoretical estimates, equations (9) and (13) respectively, as a function of γ.The figure shows data for N = 1000, averaged over M = 10 4 iterations.The values of W used to generate this figure span more that 58 decades.

Fig. 7
Fig. 7 Numerical investigation of the function F (x) defined in equation (3) for the one-dimensional model with Gaussian correlation function: ln W is evaluated as a function of ln R, for a range of different values of D.

Fig. 9
Fig. 9 Numerical investigation of the function F (x) defined in equation (3) for the two-dimensional model with Gaussian correlation function: ln W is evaluated as a function of ln R, for a range of different values of D.

Fig. 10
Fig.10The data in figure9collapse onto the function K(x) plotted in figure6(b), in accord with equation (37).Note that the assumptions behind asymptotics (37) break at ln R < 2.5, but the data still collapse even at smaller R.