A general approach to evaluate the ensemble cross-correlation response for PIV using Kernel density estimation

Cross-correlation in particle image velocimetry is well known to behave as a non-linear operator, depending heavily on the distribution of tracer images and image quality. While analytical descriptors of the correlation response have so far been dealt with for simplistic flow cases, in this work a methodology is presented based on Kernel density estimation to retrieve the inherent correlation response to any deterministic flow field. The new approach bypasses the need for Monte-Carlo simulations and its inherent sensitivity to parameter settings make it a more efficient alternative to analyse filtering of the underlying velocity field due to image cross-correlation. The derivation of the underlying equations is presented and a numerical assessment corroborates the suitability of the approach to mimic ensemble correlation.Graphical abstract


Introduction
Cross-correlation is by far the most common statistical operator to evaluate tracer displacements in particle image velocimetry (PIV) because of its robustness and computational efficiency when implemented as a sequence of Fourier transforms. By relating the single velocity vector for each interrogation window to the dominant correlation peak, cross-correlation is know to return only a low-pass filtered representation of the underlying flow field in that flow scales smaller than the interrogation windows cannot be retrieved. This in turn affects the spatial resolution (e.g. Spencer and Hollis 2005;Lavoie et al. 2007;Kähler et al. 2012), for which reason many studies have been devoted in overcoming this drawback by improving the involved image analysis processes through, among others, windowing functions (e.g. Nogueira et al. 2005;Astarita 2007;Schrijer and Scarano 2008), alteration of the interrogation area (e.g. Scarano 2003;Di Florio et al. 2002), accounting of multiple correlation peaks (e.g. Masullo and Theunissen 2018) or even adaptive window placement and sizing (e.g. Theunissen et al. 2007).
Although the response of cross-correlation is pivotal in the estimation of PIV uncertainty (Sciacchitano et al. 2015) and the extraction of flow statistics (Liberzon et al. 2012), the inherent cross-correlation response has been the focus of relatively few studies. Many uncertainty analyses consider the variation in obtained displacement estimates, for example, but do not reflect how well the velocity vector represents the underlying flow field. While analytical descriptors of the correlation's transfer function have been documented, these were limited to simplistic cases such as uniform flow (Keane and Adrian 1992), shear and strain flow (Westerweel 2008) and sinusoidal displacements (Theunissen 2012). For more general synthetic flows, such analyses are typically limited to Monte-Carlo simulations, whereby numerous PIV images are generated either on the basis of numerical simulations or available mathematical definitions (e.g Lecordier et al. 2001;Lecordier and Westerweel 2004), or through dedicated experiments mimicking the flow condition of interest (e.g. Rahgozar et al. 2013).
Monte-Carlo simulations offer the benefit of yielding systematic errors and quantifying random errors. However, these random errors are, besides the underlying displacement distribution, dependent on seeding density, particle image diameter, etc. In particle image velocimetry flow fields are sampled at tracer locations. The distribution of particles captured in each instantaneous PIV recording will thus constitute only one of many possible realisations. An example is the appearance of so-called peak splitting (Jimenez et al. 2016). This causes the retrieved (instantaneous) cross-correlation map, and consequently also the resulting displacement vector, at each interrogation location to vary even when the flow field remains unchanged. To obtain a statistically meaningful displacement estimate, a large number of PIV recordings are, therefore, required to allow the averaging process to converge and minimise the associated uncertainty. This makes it less straightforward in Monte-Carlo simulations to decouple the effect of altering tracer patterns or image noise from purely the inherent cross-correlation response.
In this work, a methodology is described to obtain an invariant ensemble cross-correlation map for a given constant flow field, linked solely to the underlying displacement field. The approach involves solving the mathematical representation of PIV image cross-correlation without the need of Monte-Carlo simulations, bypassing the associated ambiguities. To this extent, the paper starts by recapitulating the correlation theory set out in Adrian (1988) and Westerweel (2008). The concept of Kernel density estimation (KDE) to retrieve the probability density of captured displacements is elaborated in Sect. 3 and implemented in Sect. 4. Digitization is addressed in Sect. 5 where incorporation of iterative image deformation is discussed. Finally, a numerical assessment corroborates the applicability of the method to replicate ensemble correlation maps.
It should be noted that the objective will not be to discourage the use of Monte-Carlo simulations, but to present a more efficient process for the general evaluation of the displacement modulation characterising cross-correlation. Besides providing a better understanding of cross-correlation responses, the methodology will be extremely suited to obtain the resulting (ensemble-averaged) velocity field if PIV were to be used to measure the flow field predicted with steady CFD simulations (e.g. RANS); assuming the CFD result is an accurate representation of the real flow, even in the absence of experimental errors, the non-linear behaviour of cross-correlation will yield a filtered analogue causing the numerical and PIV velocity fields to differ. A-priori filtering of the CFD data enables discrepancies between numerical and empirical PIV data to be attributed to misconceptions in the actual simulations or due to the mere correlation-inherent filtering, allowing an improved validation of CFD codes.

Spatial cross-correlation statistics
The point of departure is the equation of the ensemble mean of the continuous spatial correlation reported in Adrian (1988) and Westerweel (2008), where it is assumed that (1) the camera's optical axis and light sheet are orthogonal, (2) the light sheet intensity varies only along the out-of-plane direction, (3) laser sheets belonging to different laser pulses remain in the same plane and (4) the laser sheet thickness d z is smaller than the depth-of-focus such that all illuminated tracers are in focus. These assumptions do not restrict the generality of the equation, but are reflective of good experimental practice. Contrary to the original works, the current authors will allow particle image diameters to vary in the following: ⟨R D ( )� ⟩ is the displacement correlation and represents the conditional ensemble average of the correlation of fluctuating image intensities subject to the flow field (1) ( , t) = (u, v, w)( , t). 1 This velocity field is expressed in terms of the spatial coordinates in the flow domain = (X, Y, Z) and is known from, e.g., CFD simulations. In the ensemble averaging process, the flow field remains fixed while considering variations in tracer patterns C ′ 1 and C ′ 2 . The tracer patterns are projected onto the image domain, which is defined by the spatial coordinates = (x, y) . In line with good experimental practice, light sheets have been assumed to be sufficiently thin such that imaging can be considered to be paraxial and the projection of the tracer pattern only involves an integration along the out-of-plane direction Z. The magnification M will then be a constant ( x = MX and y = MY ) and the displacement field (defined in the planar image domain) ( ) = ( x , y )( ) is a scaled projection of the flow field onto the image plane.
The displacement component along, e.g., the x-direction is defined by each tracer's location along its respective path . For ideal tracers the particle velocities will be identical to the fluid velocity. Because tracer diameters are typically in the order of 1 μm , the solid volume fraction (ratio between volume of particles and gas) in PIV is much lower than 10 −5 permitting the effect of the tracers on the flow turbulence to be neglected (Elghobashi 1994). The effect of inertial tracers can then be appropriately simulated in the majority of CFD packages through oneway coupled particle dispersion (Greifzu et al. 2016). As per Adrian (1988) x ( ) ≃ Mu( , t) t provided u can be considered to remain constant over the laser pulse separation t , which is satisfied by proper choice of t . In a similar manner, y ( ) ≃ Mv( ) t and the tracers' displacement normal to the laser sheet (out-of-plane) Z ( ) ≃ Z (M ) = w( ) t.
In Eq. 1, K is a proportionality constant dependent on M, and W k defines the extents of the interrogation window centred on k . I ok (Z) refers to illumination intensity profile, which is commonly Gaussian shaped and, based on the second assumption, only a function of Z; I ok (Z) = exp(−8Z 2 ∕d 2 z ) . The thickness of the light sheet d z corresponds to the e −2 width of the profile.
The peak intensity I zk refers to the light intensity scattered by the particles and is for common PIV applications independent of the particle size. Physical particle sizes in PIV applications are typically larger than the laser wavelength ( d p ∕ > 1 ), making the scattered light intensity I T proportional to d 2 p . The total intensity captured by the camera sensor for a single particle image is given by With d 2 = M 2 d 2 p + d 2 s and the diffraction-limited spot diameter d s = 2.44(1 + M)f # , it follows that the image intensity I zk is nearly independent of the particle diameter (Raffel et al. 2018). When applying fluorescent particles I T ∝ d 3 p and, therefore, I zk ∝ d p , in which case the particle diameter can be considered (Appendix 1). Observe that because of the proper camera alignment (cf. assumption 1) the lens' transfer function, and accordingly d s , remains constant across the spatial image coordinate and also along the out-of-plane coordinate (cf. assumption 4).
The normalised particle image intensity distribution is defined by o , which can also be appropriately modelled by a Gaussian (Olsen and Adrian 2001) 3.67 and corresponding e −1∕2 and e −2 widths of approximately 0.36d and 0.74d , respectively. The conditional space-time correlation of the fluctuating tracer patterns is given as (Adrian 1988;Westerweel 1997) After introducing the intensity-weighted particle image selfcorrelation F ( ) = ∫ I z1 I z2 o ( ) o ( + )d and performing the integrations, Eq. 1 becomes 2 Because of the thinness of the light sheets the dependency of displacements on the Z-coordinate can be neglected (see also Sect. 5.4). The last term in Eq. 3 can then be excluded from the integration over Z ′ yielding (3) (4) 1 The instantaneous correlation map , can be split into a mean, ⟨R D ( )� ⟩ , and fluctuating part (Westerweel 2000). The latter consists of the contributions of correlation between random particles. When taking the ensemble average of all correlation maps for a time-invariant velocity field, the contribution of the fluctuating component vanishes and only the particles' self-correlations need to be considered. 2 The ensemble averaging also considers different realisations of particle image diameter. Only the particle image intensity distributions 0 and intensities I zk are inherently functions of the particle image diameters and thus only the convolution F ( , ) will inherit this dependency. Let g( ) represent the probability density function of the particle image diameters , then the expected value of the intensity-weighted image correlation, for a given shift is given as The contribution of the integration along Z ′ has been incorporated in K ′ . A similar form of this equation was presented in Westerweel (2008) albeit currently it is not assumed that all particles have the same image diameter. Instead an ensemble average of the particle image self-correlation has been introduced. The first remaining integration involves a convolution between the ensemble average particle image self-correlation ⟨F ⟩ , which is approximated by exp(−0.5 2 ( d * ) −2 ) , and the displacement distribution.
Variable d * is a representative particle image diameter dictated by the point-spread function of the imaging lens, and remains invariant with . Particle image diameters can still vary across each instantaneous snapshot. From a statistical perspective, it is only assumed that each particle image diameter is drawn from the same unique probability density distribution with a mean equal to d (not d * ). Details regarding the derivation of d * for different particle image diameter probability distributions are provided in Appendix 1. The integral over in Eq. 4 represents a weighted form of the displacement distribution F ,w defined by all possible realisations of the displacement field. In Soria and Willert (2012), this is referred to as the joint probability density

Displacement density estimation
The displacement distribution function F ,w is generally unknown. However, when dealing with analytical velocity fields or CFD data, velocity data (and, therefore, displacement vectors) can be obtained at any spatial location. The second integral in Eq. 4 can at first be approximated by constructing a weighted histogram from N samples of the displacement components (( x ) i , ( y ) i , ( z ) i ) taken at locations (x, y) i , with i = 1, … , N . Weighted histograms are constructed by adding to each bin, the sample's weight, which currently is defined by the out-of-plane motion Z and the intensity weighting defined by the sample's initial ( P 1 ( ) ) and final location ( P 2 ( + ( ) ). This can be improved to a weighted, continuous, kernel density estimate (Wang and Wang 2007) using continuous kernels G; selecting only the n (with n ≤ N ) measurable displacement combinations as per Eq. 5, F ,w can be approximated by The kernel G is commonly chosen to have a unimodal shape centred on zero. While the Epanechnikov kernel yields the lowest mean-square error (Epanechnikov 1968), the popular Gaussian kernel will be shown to be more convenient; The bandwidths h x (n) and h y (n) are smoothing parameters which are more influential than the choice of kernel and must be chosen appropriately. Each bandwidth is solely dependent on the displacement samples i and is defined as h(n) = 0.9 min( w , IQR w ∕1.34)n −1∕5 (Sheather 2004) where w and IQR w are the standard deviation and inter-quartile range of the weighted (viz. Eq. 6) displacement samples; function of the three-dimensional velocity field. While each realisation of the displacement field normally has equal importance, an exponential weighting is introduced depending on the local out-of-plane motion z . This considers the change in the particle images' intensity between snapshots as brighter particles will yield a higher contribution to the correlation map (Nobach and Bodenschatz 2009). The weighting functions W i can be written as a multiplication between a top-hat T i and intensity weighting P i . Independent of the adopted weighting, both are merely functions of the image coordinates . Accordingly, T 1 ( ) will define the interrogation area extents, whereas T 2 ( + � ) factors in the in-plane loss of particle images. Interrogation areas (read T i ) are typically rectangular in shape with respective sizes = (W S,x , W S,y ) . Recalling that denotes the centre of the correlation window, the integrand thus provides a contribution only when i signifies the ith element in the collection of sorted samples and (integer) indices q 1 and q 2 are selected such that t 1 ≤ 0.25 ,

Continuous image cross-correlation
Having obtained an analytical approximant for the inner integral and combined with the expression for the particle image self-correlation, Eq. 4 can be simplified to involve a mere summation of convolutions between Gaussian functions; This expression, in which all constants originating from the convolution are integrated in K ′′ , allows a simplistic numerical evaluation to obtain the correlation function and is only feasible because the Gaussian kernel (Eq. 7) was selected. The bias error in the approximated correlation map R D ( ) is given as (the derivation is included in Appendix 2) where * signifies the convolution operation and in the final step use has been made of the identity (f * g)∕ s = f * g∕ s . According Eq. 11, the systematic error will be proportional to the local second-order gradients in the correlation map whereby strongly curved density functions are over-smoothed by the kernel density estimate and, therefore, underestimated (at a correlation peak, for example, 2 ⟨R D ⟩∕ s 2 = ⟨R D ⟩ �� < 0 ). However, as asserted by a one-dimensional simulation of which the outcome is depicted in Fig. 1, relative errors are only of importance for highly skewed distributions. With the correlation map having a variance of at least the particle image diameter, sufficiently small bandwidths h ∼ (10 −2 ) will render the general error term h 2 2 ⟨R D ⟩ ′′ negligible. To this extent, the bandwidths calculated in paragraph 3 are further scaled down by a factor 10, which is permitted since the number of samples collected will be sufficiently large [ N ∼ (10 4 − 10 5 ) ] to render a representative displacement distribution.

Spectral filtering
With the aim of improving correlation robustness to image noise, spectral filters can be applied of which the symmetric phase-only filter [SPOF; Wernet (2005)] and phase transform [PHAT; Eckstein et al. (2008)] are the most common. Denoting the mean subtracted image intensities of successive snapshots as I ′ 1 and I ′ 2 , respectively, the filtered correlation is defined as where  symbolises the Fourier transform, the notation (⋅) c refers to the complex conjugate and = 1 in case of PHAT and = 1 2 for SPOF. In a similar (11) ds 2 ) at the peak location ( x pk ) with varying skewness and correlation width , assuming a 1D skew normal distribution for the correlation map; | (R D ( ))| . With R D being a sum of weighted Gaussian functions, the normalisation complicates the transform (Appendix 3) and does not lend itself to obtaining a simplified expression for ⟨ R * D ( )| ⟩ . Instead, the inverse Fourier transform will need to be numerically evaluated and for this reason spectral filtering is left out of consideration hereafter.

Analytical expression for digital correlation
The expression for the ensemble-averaged displacement correlation derived thus far considers continuous images. Digital camera sensors are, however, discrete, necessitating the introduction of two additional operators related to pixelisation and discretization. This translates into an additional c o n v o l u t i o n w i t h t h e t r a n s f e r f u n c t i o n ( ) = (x∕ √ p f ) (y∕ √ p f ) of a pixel and sampling at integer pixel shifts (Theunissen 2012). With p f symbolising the pixel fill factor, the triangular function (x∕ Substitution of Eqs. 4, 6 and 7 in Eq. 12 together with the definition of the pixel transfer function and particle image self-correlation ⟨F ⟩ , the discrete correlation map can be written out as Because it is the relative peak amplitudes in ⟨ R D ( )| ⟩ that are of importance, all integration constants have been combined into K ′ .
After several algebraic operations utilsing the expressions for (⋅) , the correlation map amplitude for the shift (s p , r p ) related to digital images is given by where and To recapitulate, Eqs. 14-16 provide a semi-analytical equation of the ensemble-averaged correlation map for a time-invariant displacement field . Out-of-focus effects are neglected and particle images are considered to be Gaussian shaped as per typical PIV conditions. Laser light sheets are assumed to be spatially coincident and oriented orthogonally to the camera's optical axis with normal varying intensity only along this axis. Sheet thicknesses are presumed sufficiently small to warrant a constant magnification and to ensure that the out-ofplane motion varies only normal to the camera's optical axis. These conditions are all in line with good experimental practice. Generality is ensured by accounting for varying particle image diameters and correlation window intensity weighting through the representative particle image diameter d * and kernel weighting w i , cf. Appendix 1 and Eq. 6, respectively.

Systematic error
Akin to conventional correlation maps, the resulting ensemble correlation R D ( ) will be discrete and sub-pixel accuracy is conventionally achieved by Gaussian interpolation. Systematic errors are thus introduced by interpolation (i.e. fitting) inaccuracies, which are dependent on asymmetry in the correlation map. Simultaneously, the bias error in the correlation map was shown to be proportional to the local second-order gradients (Eq. 11) and this is again expected to influence the displacement estimates.
To estimate the bias effect on the retrieved displacement, a similar procedure as per Westerweel (1993) has been adopted. Hereafter, only the s-component will be dealt with as the procedure is identical for the r-component. The asymmetry in the correlation values at the integer displacement locations centred on the correlation peak [assumed to be located at (s p , r p ) = (0, 0) ] for imposed fractional displacements is incorporated by modelling the discrete correlation . To obtain derivatives, in the vicinity of the fractional displacement this discrete correlation map ⟨ R D (s p ) ⟩ is presumed to be Gaussian shaped, 3 K exp(−(s p − ) 2 ∕2a 2 ) , enabling an explicit expression for the second derivatives; The sub-pixel displacement estimate retrieved, based on R D (s p ) is subsequently given by The variation in the sub-pixel error =̃− is depicted in Fig. 2a for the case of d * = 4.5px. Note that = 0 refers to the ideal case where R D ≡ ⟨R D ⟩ and sub-pixel inaccuracy is inherent to the three-point Gaussian fitting, independent of a I . With increasing (and subsequently high values for . Fig. 2 a Evolution in subpixel displacement estimate from R D for particle images of 4.5 px diameter, a pixel fill factor p f = 0.7 and varying bandwidths in the estimation of the displacement distribution ( = 1 2 h 2 x ). b Evolution in sub-pixel displacement estimate from ⟨R D ⟩ ( = 0 in Eq. 17) when adding random noise n to the correlation map Fig. 3 Illustration of image deformation. Original images (black grids top row) are distorted considering a forward or b central difference, yielding the red and blue grids. The lower grids represent the recon-structed pseudo-images with relocated particle images and corrected displacement vectors 3 The continuous correlation map is assumed to be based on a uniform displacement and thus has an e −2 width equal to that of ⟨F ⟩ viz. 4 d . As per Westerweel (1993) the convolution (Eq. 12) between the triangular function (s∕ √ p f ) and continuous correlation map can be approximated by K exp(−(s p − ) 2 ∕2a 2 ) , where a 2 = 1 4 (p f + 4 2 d * 2 ). The amplitude K will be given by K = a I ∕ exp(− 2 ∕2a 2 ).
the bandwidth h x ), the effect of curvature in the correlation map gains importance in its estimate and sub-pixel error increases in magnitude. The derivative ⟨ R D (s p ) ⟩ �� increases with particle image diameter and the effect of curvature can thus be expected to gain importance for smaller particle image widths. For typical bandwidths ∼ (10 −4 ) and the sub-pixel error will be dominated by the bias error inherent to the Gaussian peak fit estimator, which is conventionally estimated at 10 −1 px in average. In Fig. 2b the variation in due to slight deviations in the correlation map ⟨R D ⟩ is presented. Here random noise, drawn from a normal distribution with a variance of 0.3% of the correlation peak amplitude ( a I ), is superimposed onto the correlation map. This fluctuating component can originate from the correlation between non-corresponding particle images and correlation with the background. Figure 2b indicates that although from a statistical perspective the average sub-pixel error does not change from the ideal case, instantaneous differences between displacements obtained from R D and ⟨R D ⟩ can still occur, with magnitudes (corresponding to a 95% confidence level) ranging from one to ten times the order of the noise component (i.e. 0.006-0.02). This will show to be useful in the discussion of the numerical assessment. An in-depth analysis of sub-pixel measurement precision can be found in Westerweel (2008).

Implementation
In view of the common process of iterative image deformation to enhance spatial resolution (Scarano and Riethmuller 2000), the described approach remains valid but requires certain modifications to incorporate the equivalent correlation window deformation (Fig. 3). To this extent, the forward difference scheme is considered first, allocating a particle image at ′ with an associated displacement vector ( � ) . Accordingly, the particle's location in the second image snapshot will be given by �� = � + ( � ) and copied to location ��� = �� − −1 ( �� ) in a pseudoimage (viz. the deformed second snapshot). Here, represents the displacement field obtained from a previous iteration, i.e. the predictor. The inverse mapping −1 ( �� ) is defined such that ��� + ( ��� ) = �� i.e. − −1 ( �� ) is the vector, located at ′′ , based on the predicted displacement field, pointing to the location from where it came. The corrected displacement field thus becomes c ( � ) = ��� − � = ( � ) − −1 ( � + ( � )) . It should be noted, however, that c ( � ) can only be accounted for provided ′′′ falls within the original correlation window, i.e. | ��� − | ≤ 2 (cf. Eq. 5). Most common, however, is to deform both image snapshot by equal partitions of the predicted displacement (Werel ey a n d M e i n h a r t 2 0 0 1 ) ; I c In the construction of I c 1 ( ) , rather than the original locations , the first snapshots are thus sampled at � = − 1 2 ( ) , with corresponding displacement vectors pointing to �� = � + ( � ) (Fig. 3b). Also these ′′ locations in the second snapshot will be translated according the predictor field; ��� = �� − 1 2 −1 ( �� ) . The corrected displacement to be measured thus becomes more i n t r i c a t e : ) and can only be measured provided | ��� − | ≤ 2 . Despite the additional transformations, the corrected displacement data, c ( � ) , serve as input for Eq. 14, making the proposed methodology conducive in the study of general PIV algorithm response. Equations 14, 15 and 16 enable the calculation of the correlation map based on solely the underlying flow field. The algorithmic implementation is clarified in Algorithm 1.

Additional observations
Equation 14 can be extended to incorporate spectral filtering. However, similar to the continuous form (Sect. 4.2), it will be more appropriate to evaluate the normalisation and ensuing inverse Fourier transform numerically. This is especially true since, in case of SPOF, the rescaled Fourier transform pertaining the triangular function will equate to | | | When thicknesses d z are such that variations of are measurable across the illumination sheet, Eq. 3 must be used retaining the integration over Z ′ . The kernel density estimator F ,w is then constructed by considering individual samples within the illuminated volume and attributing particular weights depending on the samples' positions and shifts. The subsequent operations remain unchanged.

Synthetic image generation and analysis
To assess the validity of the derived approximations, 6000 synthetic 8bit PIV images, L x × L y = 525 × 525 px 2 in size, were generated with different underlying velocity fields. For each image, 6 × 10 3 particles were distributed randomly across the image and along the laser sheet thickness Z∕dz ∈ U −1 2 , 1 2 . Particle image diameters were drawn from a normal distribution  (4.5 px, 0.5 px) and integrated over pixels with a pixel fill ratio p f = 0.7 . An exemplary PIV image is depicted in Fig. 4a.
Within correlation windows, the mean intensity was subtracted and intensities were normalised with the standard deviation in captured intensity. The unbiased cross-correlation was performed by means of fast Fourier transforms (Raffel et al. 2018, p. 137-138); R D =  −1 ( (I � 1 ) (I �c 2 ))∕ −1 ( (W) (W c )) where I ′ 1,2 are the rescaled intensities within the correlation windows of size W S × W S . W is a unit matrix of equal size. Following the procedure described in Appendix 1, the representative particle image diameter, d * , was evaluated to be 4.6332 px . This value was subsequently used in Eq. 14 to yield the theoretical correlation map R D ( ) . Intensity weightings P 1 and P 2 were taken as unity for simplicity.
In the image analyses 2 × 2 different approaches were considered. The first consisted in calculating the displacement based on instantaneous correlation functions and subsequently averaging the obtained results. These results will be denoted hereafter by (⋅) . The alternative ensemble correlation involved first averaging the correlation maps and retrieve one single displacement estimate, symbolised as ⟨(⋅)⟩ in each direction (Meinhart et al. 2000). In both approaches, sub-pixel accuracy in the displacement estimates was then obtained either by means of the more common three-point Gaussian interpolation of the identified correlation peak (Willert and Gharib 1991) or two-dimensional Gaussian regression (Nobach and Honkanen 2005).
Flowcharts of the two image analyses approaches are provided in Fig. 5 for clarity. It becomes clear that the proposed semi-analytical correlation response simplifies the Monte-Carlo analysis adopting ensemble correlation. Although ensemble correlation is not standard practice in the analysis of instantaneous PIV images, the authors argue that it is the only manner to evaluate the filtering of the velocity field captured within a correlation window inherent to the statistical operator. Moreover, in view of validating time-averaged CFD solutions, ensemble correlation provides more reliable Fig. 4 a Illustration of a synthetic PIV images (contrast enhanced for clarity). b Underlying displacement field of the lid-driven cavity (Re = 100), under-sampled by a factor 15 for clarity. The red squares indicate the 99 × 99 px 2 interrogation windows analysed and robust (time-average) displacement estimates compared to the average of instantaneous displacement fields as it reduces random noise. In addition, mean displacement fields can be obtained considerably faster by ensemble correlation (Willert 2008).
At this point, the authors would also like to underline that the objective of the following assessment is not to scrutinise the performance of traditional Monte-Carlo simulations. Increasing the noise in the synthetic PIV images would only deteriorate the convergence in the average of the instantaneous data. It is for this reason that the authors have intentionally neglected image noise as an additional degree of freedom since it would not affect the outcome of the theoretical response. The authors neither claim the theoretical response to be more accurate compared to Monte-Carlo analyses. Instead, the assessment focuses on the validity, accuracy and robustness of Eq. 14 to replicate ensemble correlation maps such that the user can evaluate the filtering of the observed velocity field in a more efficient manner. This can be subsequently used to develop appropriate intensity weighting schemes P 1 and P 2 (viz. Eq. 6) or to filter the time-averaged CFD solutions such that the resulting field mimics the mean statistics of the experimental PIV data.

Lid-driven cavity
The first velocity field considered consisted of the numerically simulated flow field underlying a 2D lid-driven cavity at a cavity-height-based Reynolds number of 100 with a grid spacing of 525∕1000 px. 5 Discrete displacements were re-interpolated using linear interpolation to particle positions and rescaled to yield maximum displacements of 4 px. The corresponding (sub-sampled) velocity field is depicted in Fig. 4b. The lid is placed on the right-hand side and moves upwards by 4 px. Three correlation windows, each 99 × 99 px 2 in size, were selected within the flow field (see Fig. 4b). The first overlapped a secondary recirculation area located in the upper left corner and presented a region of strong velocity gradients. The second correlation window captured a region of nearly uniform displacement, whereas the third contained part of the larger recirculating structure. Fig. 5 Flowchart of the implemented image processing routine in the Monte-Carlo analyses adopting ensemble correlation and its theoretical counterpart Fig. 6 a Evolution of the displacement estimate obtained through cross-correlation of the first 99 × 99 px 2 correlation window depicted in Fig. 4b with number of image pairs considered. b Convergence in displacement estimate obtained from the correlation map estimated using Eq. 14 for the three 99 × 99 px 2 correlation windows shown in Fig. 4b with number of samples N 5 Details regarding the implementation can be found on http://loren abarb a.com/blog/cfd-pytho n-12-steps -to-navie r-stoke s/.
The evolution in displacement estimates with number of images is illustrated in Fig. 6a for the first correlation window and is representative for all analyses. Initially, both the arithmetic average (blue) and ensemble results (red), for either of the sub-pixel fitting schemes (hollow vs. filled symbols), show large fluctuations but converge to near-identical values as the number of images increases (Table 1). The data scatter in the (⋅) estimates is reflective of the random error Table 1 Displacement estimates for the 99 × 99 px 2 correlation windows (CW) placed in the lid-driven cavity flow (Fig. 4) incorporating 6000 image pairs The arithmetic averages are denoted by (⋅) , values from ensemble correlation by ⟨(⋅)⟩ and estimates based on Eq. 14 by ( ⋅) . Values between brackets correspond to 95% confidence levels (based on data spread) scaled by a factor 10 3 . True displacements in the correlation window centroids are, respectively,  . 7 Juxtaposition of ensemble correlation maps based on 6000 image pairs (top row) and estimated maps using Eq. 14 (bottom row) for the lid-driven cavity for correlation window 1 (a, d), 2 (b, e) and 3 (c, f) in Fig. 4b and levels will change with seeding density (Westerweel 2000). Equation 14 on the other hand yields a single value with no data scatter. Changing the sub-pixel displacement estimator (three-point interpolation or nine-point regression) has a noticeable effect though, yielding variations in displacement estimates up to 0.15 px (CW3, ⟨v⟩ , Table 1), even in case of isolated correlation peaks (CW2, ⟨u⟩ , Fig. 7b). The evolution in the displacements based on R D with number of sampling locations N per correlation window is depicted in Fig. 6b for the three correlation windows considering the horizontal displacement. For insufficient samples, R D will fail to be representative of ⟨R D ⟩ resulting in larger discrepancies. With increasing N, the kernel density estimate becomes more accurate and for N > 10 4 estimated displacements have converged. Choosing too large N, however, considerably augments computational effort with no gain in accuracy. For this reason, N = 10 5 is suggested as a generally appropriate value.
Correlation maps taking into account 6000 instantaneous correlation functions and N = 10 5 are juxtaposed in Fig. 7. Note that correlation maps have been normalised to a maximum of one to facilitate comparison and negate scaling factors. Correlation maps obtained through ensemble correlation and Eq. 14 are indistinguishable, irrespective of the underlying distribution. Though not visible in the displayed correlation maps, ⟨ R D ( ) ⟩ did not reach zero values for | | > but showed random fluctuations as a result of random particle correlations. These fluctuations were in the order of 3% of the peak amplitude. Random noise in the correlation map with this amplitude could lead to variations in displacement estimates with the three-point Gaussian fit as high as 0.025 px according Fig. 2b. The tabulated values in Table 1 suggest differences between the displacements obtained from ⟨R D ⟩ and R D , the latter denoted by ( ⋅) , are in the order of 0.005 px with a maximum of 0.016 px. These differences accordingly reduce when invoking Gaussian regression as the sub-pixel estimate becomes less sensitive to noise. However, also the displacement estimates alter by as much as 0.1 px (CW3) merely by changing the sub-pixel interpolation scheme. This example first demonstrates the displacements retrieved by cross-correlation do not correspond necessarily with the exact values at the window centroids, showing differences up to 1.5 px (CW3). These discrepancies concern how well a vector represents the underlying velocity field and are caused by correlation filtering. Second, the figure corroborates Eq. 14 to yield correlation maps which show negligible differences with those obtained through ensemble Fig. 8 Discrepancies in a horizontal and b vertical displacement components for the Oseen vortex (Eq. 18) between Eq. 14 ( ⋅) and ensemble correlation ⟨(⋅)⟩ . The red vectors represent the under-sampled (factor 3) and scaled (factor 5) underlying velocity field. The correlation map corresponding to the vector positioned at the white cross ( x = 260 px , y = 312 px ) is presented in Fig. 9. c Magnitude ũ − ⟨u⟩ (left hand side) placed side by side the product between displacement variance 2 u and kurtosis u of the underlying displacement field within a 25 × 25 px 2 correlation window (right-hand side). The white vertical line indicates the plane of symmetry of the velocity field correlation, yielding displacement estimates which differ from ensemble correlation by amounts of the same order of magnitude as the correlation uncertainty. Table 1 also incorporates the displacements following a weighted averaging: (ũ) MA = ∑ n i w i ( x ) i (cf. Eq. 6), evincing the averaging filter not to be a generally representative model of the correlation response. The correlation response is intrinsically non-linear, yet, under very particular conditions the (linear) averaging filter can provide a reasonable estimate of the true peak location of the correlation map. Considering the simplified problem of the sum of two Gaussians of equal variance separated by a distance (cf. Eq. 6), the resulting peak location will be located halfway (viz. the average) provided both normal distributions have equal amplitude and ∕ ≤ √ 2 . In terms of PIV applications, this translates into the very stringent conditions that the out-ofplane motion of the particles is (nearly) equal or properly rescaled by means of intensity weighting (viz. w i Eq. 6) and displacement variations do not exceed √ 2 times the particle image diameter. The proposed semi-analytical model instead provides a conducive alternative without imposing any restrictions.

Oseen vortex with window refinement
The ability to replicate the correlation response in case of iterative window refinement was assessed on the basis of an Oseen vortex adopting initial interrogation areas of 99 × 99 px 2 . These areas were reduced to 25 × 25 px 2 in three refinement steps maintaining 50% mutual overlap. Displacement vectors obtained within an iteration were linearly interpolated onto a pixel-wise grid, which was subsequently used to deform the images adopting a top-hat predictor filter kernel (Schrijer and Scarano 2008). Sub-pixel displacement estimates were retrieved based on Gaussian regression and intensities were interpolated using a quintic B-spline (Astarita and Cardone 2005). To negate influences related to accumulated inaccuracies in the inverse mapping (see Sect. 5.3) forward differencing in the image deformation step was considered, i.e. only the second snapshot was deformed (Fig. 3a).
With the origin placed at the image centre, the vortex radius was set to = 60 px and the maximum displacement (u ) max to 6 px ( = 1.25643 ). This can be considered a strong vortex. An additional 20% out-of-plane displacement component w was imposed ( W max ∕dz = 0.2); Bandwidths in the kernel density estimation were adapted locally as per Abramson (1982), reducing the bandwidth (and, therefore, smoothing) in regions of correlation peaks. Values of the estimated (pixelised) correlation map R D ( ) were re-interpolated linearly at the sampled displacements ( x ) i , ( y ) i yielding R D ( i ) . The original bandwidths h x and h y were subsequently rescaled locally following h * x,y ( i ) = h x,y ∕[R D ( i ) 0.5 ] with the geometric mean, = exp( 1 n ∑ n i log(R D ( i ))) . The KDE process was next repeated with the updated bandwidths h * x,y . The differences in the final velocity field obtained with ensemble correlation and Eq. 14 for the two velocity components are depicted in Fig. 8. Discrepancies attain magnitudes of at most 0.067 px and are concentrated near the vortex core, adopting a pattern dependent on the peculiarities of the underlying displacement field. The latter is highlighted in Fig. 8c, where the magnitude �ũ − ⟨u⟩� and product between the kurtosis and variance of the displacement distribution captured within a 25 × 25 px 2 window are displayed side by side, evincing the origin of the lobes observed in the displacement discrepancies to be related to the underlying displacement distribution and ensuing correlation map (cf. Eq. 11) . The correlation maps pertaining the vector with the highest difference in the vertical displacement, located at (x, y) = (260 px, 312 px) , are presented in Fig. 9. Correlation maps show minimal discrepancies, corroborating the multi-iterative process, including the inherent image deformation, to be correctly modelled. Observing the difference between the correlation maps in Fig. 9c, it can be noted that a bias is present. With maximum displacements in the order of 6 px and a representative particle image diameter in the order of 4.6 px, the correlation map should attain zero values beyond radii of 10 px. While R D ( ) correctly predicts this to be the case, the ensemble correlation map contains contributions from random particle image pairs, yielding slightly negative and fluctuating correlation values. 6 Plotting the largest eigenvalue of the Hessian of the ensemble correlation map, areas of largest differences between R D ( ) and ⟨ R D ( ) ⟩ can be seen to correspond to the region of higher curvature (i.e. larger eigenvalue) in line with Eq. 11. Even though the differences in correlation coefficient are in the order of 6%, this is sufficient to influence the sub-pixel peak fitting and yield variations in estimated displacement of the same order of magnitude (cf. Fig. 2b). Nevertheless, the systematic error in the ensemble correlation estimates (Fig. 10a) caused by the correlation to yield the most probable displacement which is, therefore, dependent on local curvature in the displacement field, is also observed in the theoretical map (Fig. 10b). The noticeable absence of axial symmetry is due to the fact that correlation windows are not located radially equidistant with respect to the vortex core. This example thus demonstrates the derived KDE-based analytical expressions for correlation to be applicable in iterative PIV image analyses and to be useful in simulating general correlation responses.

Conclusions
Cross-correlation is the standard operator in PIV image analyses and intrinsically behaves as a non-linear operator, introducing a filtering in the measurement of underlying displacement fields. The response of the cross-correlation can be appropriately modelled as a (linear) moving averaging, provided the correlation map exhibits a single peak and particle images within the correlation windows have nearly equal intensity and out-of-plane displacement. These restrictions are typically not satisfied and for this reason in this paper a generally conducive methodology is presented in which an integral equation, describing the cross-correlation response for 2D/2C PIV, is solved by estimating the density function pertaining the observed velocity field using Gaussian kernel density estimation (KDE). Implementation of the resulting semi-analytical equations is shown to be simplistic and to replicate ensemble correlation, also when incorporated in iterative multi-grid routines. This approach enables the inherent cross-correlation filtering/response for any known steady velocity field to be characterised. Variability in Monte-Carlo results due to alterations in seeding density and/or image noise is thereby negated. The proposed routine thus replicates the ensemble-averaged PIV result if an experiment were performed whereby the underlying steady velocity field can be mathematically defined or is as predicted by a steady CFD simulation. Given its efficiency, the methodology is, therefore, ideally suited to investigate the modulation introduced by cross-correlation in PIV and CFD uncertainty evaluation.
Acknowledgements The authors would like to thank EPSRC for Matthew Edwards' financial support (EP/M507994/1).

Compliance with ethical standards
Data access statement All underlying data are described and provided in full within this paper.

Fig. 10
Bias error in horizontal displacement component with respect to the imposed Oseen vortex, based on (a) ensemble correlation (b) Eq. 14 6 Given a typical intensity histogram of a PIV image (e.g Westerweel 2000), the mean intensity I within a correlation window will be higher than the intensity associated with the largest probability. The value attributed to the probability mode of the product of two pixel intensities, I 1 and I 2 , will accordingly be inferior to I 2 . As such, the contribution I � 1 I � 2 = (I 1 − I)(I 2 − I) to the respective correlation is more likely to be negative.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat iveco mmons .org/licen ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Appendix 1: Particle image self-correlation
Particle image intensity distributions can be conveniently modelled as Gaussian, viz. o ( ) = exp(− 2 ( d ) −2 ) . The convolution o * o ( ) will subsequently be also Gaussian with a variance 2( d ) 2 whereby diameters are independently drawn from an identical probability density function g(d ) . The ensemble average of the intensity-weighted particle image self-correlation is then given as (cf. footnote 2) For typical PIV applications, the intensities I zk will be independent of the particle diameters. When all particle images have equal diameters d , g( i ) = ( i − d ) , yielding a Gaussian descriptor ⟨F ( )⟩ ∝ exp(− ∕2 2 d 2 ) . In case g( i ) assumes a normal distribution with mean d and variance 2 viz. g( i ) =  (d , 2 d ) , no explicit equation exists for ⟨F ⟩ and Eq. 19 must be evaluated numerically: where i = id and d = (d + d )∕N . The assumption of a normal distribution for g( ) is permitted considering the particle image diameter is composed of the diffraction-limited diameter (constant for all image diameters) and magnified physical particle diameter. The latter originates from a seeding device, which commonly generates tracers with a normal size distribution. For sufficiently large i the probability g( i ) tends to zero and in choosing the values of i to consider, = 16 will be sufficient. The process is illustrated in Fig. 11 for N = 10 5 . Since the exponential terms are separable in s and r, viz. = (s, r) , without loss of generality only the s-component has been considered. If ⟨F ⟩ were truly normally distributed, it would attain a width of 4 √ 2 d at e −4 intensity ratio. It reaches, however, the e −4 amplitude at s > 2 √ 2 d and the Gaussian approximation can be seen to insufficiently include the tail of the self-correlation. To this extent, the image diameter in the Gaussian approximation of ⟨F ⟩ must be increased by a factor =s∕2 √ 2( d ) . In other words, the representative particle image diameter in the assumed Gaussian-shaped self-correlation, exp(−s 2 ∕2 2 d * 2 ) , will be d * = d . Note that the process can be easily extended to incorporate any dependency of I zi on the particle diameter yielding again, although not depicted, a ⟨F ⟩ which can be approximated by a Gaussian involving a scaled image diameter.

Appendix 2: Kernel density approximation error
In Eq. 6 each combination (( x ) i , ( y ) i ) can appear multiple times, with weights determined by ( z ) i ) . Leaving the bandwidths unaltered, the summation is equivalent to a kernel density estimate f ( ) involving each realisation of one time only, omitting the weighting function. Following Epanechnikov (1968) the averaged empirical density is given by Expanding the term F ,w (s − h x , r − h y ) in Taylor series with respect to (s, r) and using the identities ∫ G(u)du = 1 ,