Smooth estimation of size distributions in an oriented cylinder model

Kernel estimators are proposed for estimating the cumulative distribution functions and the probability density functions of several quantities of interest in a stereological oriented cylinder model. This oriented cylinder model was developed to represent anisotropic microstructural features in materials. The asymptotic properties of these estimators are studied, and the estimators are applied to two banded dual phase steel microstructures. The estimation method is quite general and can also be applied to distributions of other univariate quantities of interest.


Introduction
The so-called Wicksell problem introduced in Wicksell (1925) is a classical inverse problem in statistics. The original motivation was medical. A postmortem examination of spleens containing approximately spherical tumors was performed. Based on cross sections of the spleens (showing circular profiles of the tumors), the aim was to estimate the distribution of tumor sizes based on the observed circle radii. Wicksell's problem is a typical example of a stereological problem, where one aims to infer 'three-dimensional properties' from 'two-dimensional information'. Not only within the field of anatomy, but also in materials science and astronomy, this type of problem is frequently encountered. See, e.g., Sen and Woodroofe (2011) for an astronomical application of the model. Over the years, quite some stereological problems related to Wicksell's problem have been introduced and studied; see, e.g., Ohser and Mücklich (2000) for problems related to different shapes of the three-dimensional objects and Feuerverger and Hall (2000) for a problem where the data are obtained slightly differently.
In this paper, we study another related model, specifically designed for a materials science problem. In this model, circular cylinders (all with the same orientation, say vertical axes) are distributed within an opaque medium which is cut vertically (parallel to the axes). The problem then is to estimate distributional properties of various threedimensional quantities related to size (volume, surface area, e.g.,) only based on data obtained from the two-dimensional section. This model was introduced in McGarrity et al. (2014), where also the relations between the distributions of (unobservable) threedimensional quantities and (observable) two-dimensional quantities are derived. These will be reviewed in Sect. 2. In that paper, estimators of the distribution functions are defined and studied asymptotically. These estimators are step functions. Especially in the metallurgical context, such cumulative distribution functions are considered undesirable, as they are harder to interpret for practitioners than density functions that give more direct visual information on the relative occurrence of the various sizes in the material. Section 3 discusses smooth estimators for cumulative functions in the oriented cylinder model. A particular feature of these smooth estimators is that their pointwise asymptotic behavior does depend on the rate at which the bandwidth vanishes (rate n −1/4 is optimal), but not on the constant in front of this rate.
Estimates of density functions can be obtained from these via differentiation. In Sect. 4, these density-like functions are defined and studied asymptotically. It turns out that for estimating these, the bandwidth should vanish at rate n −1/6 in order to let the MISE vanish at rate n −2/3 . The choice of the constant in front of n −1/6 is important, and we describe a reference family method to obtain data driven bandwidths based on the expressions for the asymptotically MISE-optimal bandwidths. Finally, in Sect. 5 we apply the proposed estimators to a real microstructural dataset obtained at TATA Steel.

An oriented cylinder model
In the process of representing microstructural features of interest like those mentioned in Sect. 1, a first (simple) model was proposed in McGarrity et al. (2014). We describe this model briefly here. Consider a large box in 3D, that is cut by a vertical plane. Throughout the box, points are distributed according to a low-intensity Poisson process that is homogeneous in the direction perpendicular to the cutting plane. At these points, circular cylinders are placed, all oriented in the same way, with vertical axes of symmetry. See Fig. 1 for an illustration of the situation. The squared radius X (which we consider rather than the radius itself, following the example of Hall and Smith (1988) in Wicksell's problem) and height H of the cylinders are generated as i.i.d. bivariate random vectors (X , H ) drawn from the bivariate density f , corresponding to the 3D microstructural features of interest. Note that f is a joint density and X and H are not assumed independent. The data consist of the rectangular profiles of the cylinders cut by the plane. The height of the rectangle is equal to the height of the cut cylinder, and the width of the rectangle is a fraction of the diameter of the cylinder. Taking into account that the probability of cutting a cylinder by the plane depends on the radius of the cylinder, the relationship between the joint density g of the observed rectangle pairs (Z , H ), the squared half-width Z and height H , and the joint density f of (X , H ) is derived in McGarrity et al. (2014): (1) Here, This equation can be inverted to express the joint density f in terms of g: Here Based on these relations, it is possible to estimate the distribution of univariate quantities related to the distribution of (X , H ). In this paper, we restrict ourselves to the squared radius (X ) and the volume V = π X H.
In order to save space in notation, define for h, t > 0 the function These quantities are chosen such that the random variable of interest, T , satisfies T > t if and only if X > q(H ; t). Hence, using (2) we obtain the general form of the distribution functions for these quantities where The distribution functions of the unobservable cylinder quantities are expressed now in terms of N which can in turn be derived from the joint density g of the observable pairs (Z , H ).
In practice, one is often also interested in an estimate of the probability density functions of the various univariate quantities mentioned in (3). Taking the derivative of (4) yields Now N can be estimated empirically, replacing the integral with respect to the joint density g in (5) by the integral with respect to the empirical distribution of the observed pairs (X i , H i ), 1 ≤ i ≤ n. This leads to While this estimator will provide an estimate for the function N given in (5), it can have some undesirable properties. One is that it is not a decreasing function, in fact it has discontinuities of infinite size. Indeed, taking the squared radius as example, such discontinuity appears at t = Z i , for any value of i. In McGarrity et al. (2014), an isotonization procedure is used to obtain an estimator that is a decreasing step function.
Another, related issue is that the estimate N n will not be smooth. Often one wants to assume that N is smooth and estimate its derivative. This function gives insight in the relative probabilities with which certain values of the quantities of interest occur. In Sects. 3 and 4, we introduce kernel estimators for the functions N and their derivatives.

Estimators for the function N
In view of the function N defined in (5) and its empirical estimator N n given in (7), there are various approaches one can take to obtain smooth estimators for N . One approach is to substitute a smoothed empirical distribution of the observed pairs (X i , H i ) for g in (5) rather than the empirical distribution itself. In the (univariate) context of Wicksell's problem, this approach was originally proposed in Taylor (1983). A related estimator (based on squared radii rather than radii) was introduced in Hall and Smith (1988).
Still in Wicksell's problem, van Es and Hoogendoorn (1990) suggest an alternative smooth estimator, obtained that by kernel smoothing of the function N n .
Following the latter approach, we define a smooth estimator of N , based on smoothing the empirical plug-in estimator N n defined in (7). For this, we use a smoothing kernel K , having the usual properties (symmetric continuously differentiable probability density supported on [−1, 1]). To make it more concrete, we use the biweight kernel K , defined as although this particular choice is not essential. Take a sequence of bandwidths (b n ) with b n ↓ 0 as n → ∞ and define for t > 0 For this estimator, the mean squared error for estimating N (t) (for fixed t) is defined by In order to study the asymptotic behavior of this MSE at fixed location t > 0, we impose the following condition: Now, under Condition 3.1 and fixing t > 0, for n tending to infinity the expectation of N n (t) is given by where ξ u,n denotes a point between t and t − ub n . For the squared bias part of the MSE this yields, for n → ∞, Note that this asymptotic bias can be derived for both choices of q listed in (3) simultaneously. For the asymptotic variance of N (t), both choices for q need to be dealt with separately. We follow the approach adopted in Hall and Smith (1988) for Wicksell's problem and express N n as a convolution of two functions.
First for the volume, we get (using that t > b n for sufficiently large n because For the squared radius, also using that t > 0 and so t > b n for sufficiently large n, we obtain For smooth kernel functions supported on [−1, 1], such as the biweight function given in (8), define the function (11) See (30) for the explicit expression and Fig. 2 for a visualization of the functionK based on the biweight Kernel function defined in (8). Then, the function N n corresponding to the volume can be expressed as In a similar fashion, we get for the function N n corresponding to the squared radius distribution Now note that for v < −1, This leads to the following asymptotic behavior ofK To pin down the asymptotic variance of N n (t), we also need the function for both the squared radius and volume.
Proof Consider N vol n (t). Using representation (12) and the definition of τ q (z) at z = t, for the volume we have Using continuity of N vol at t yields giving, for n → ∞, Now, for > 0 and n sufficiently large such that b n < , For I 2 , squaring the upper bound onK given in (14) and using that for π hz > t + > t + b n , we have (t − π hz)/b n < −1 for all n sufficiently large. Since E g [H ] < ∞, I 2 is bounded as n → ∞. For I 1 , we have for any c < −1 and n sufficiently large For any fixed c, the second term is clearly bounded by a constant. Taking c < −1 sufficiently small, by right continuity of τ q at t, the first term becomes using (15), the fact that can be chosen arbitrarily small in this argument and dominated convergence. The exact same method can be used for the squared radius. The result for I 2 is 2/ since there is no π h term in the integral. The result for I 1 will be exactly the same as the result for the volume where τ q (t) is g Z (t). Together, these again lead to (17).
We can now prove the following theorem.

Moreover, n ln n N n (t) − N (t) → D N (0, τ q (t)/4) as n → ∞.
This holds for both the squared radius and the volume distribution.
Proof The MSE part immediately follows from the asymptotic bias derived in (10) combined with Lemma 3.2. The asymptotic distribution result follows by also using the central limit theorem for i.i.d. random variables with infinite variance; see Chow and Teicher (1988) Theorem 4 on p. 305).
As a consequence, the asymptotically MSE optimal bandwidth is given by The MSE of the initial plug-in estimator N n defined in (7) is infinite, because its variance is infinite. A notable property of the estimator N n (t) is that as long as the bandwidth tends to zero at rate n −1/4 , the asymptotic MSE does not depend on the constant that is chosen in the bandwidth. In Sect. 5.4 of McGarrity (2013), a numerical simulation illustrates the difference between the initial plug-in estimator N n and the smoothed estimator N n and how the bandwidth impacts the estimates of the distribution functions of the squared radius and volume. In other contexts, including the estimation of the density function that will be considered in Sect. 4, choosing this constant optimally is often a delicate matter. Another notable fact is the value of the asymptotic MSE in relation to asymptotic distribution results of the empirical (non-smoothed) estimator N n and the isotonic inverse estimator studied by McGarrity et al. (2014). Both estimators are asymptotically unbiased, and normal with variance τ q (t) and τ q (t)/2, respectively (both rescaled with rate √ n/ ln n). In view of Theorem 3.3, these estimators are comparable to smoothed estimators with bandwidths of order n −1 and n −1/2 , respectively. Taking these small bandwidths results in asymptotically unbiased estimators. Smoothing a bit more (using b n ∼ n −1/4 ) decreases the variance but still results in an (asymptotically) unbiased estimator. Taking a larger bandwidth will make the bias term in the MSE the dominating one, and increase the asymptotic MSE.
The derivative of this smooth estimator may be taken to give an estimate of the density. This will be explored further in the next section.

Smooth density estimators
In order to obtain estimators for the densities f , we study the derivative of N n as given in (9). Contrary to the non-smoothed estimator N n , the estimator N n can be differentiated to obtain an estimator for the derivative ν = N . We define the estimator of this derivative as (18) Note that just as in the setting of estimating N (t), the expectation of the estimators for the function ν related to the two choices of q in (3) In order to obtain the asymptotic variance of the estimators for the squared radius and volume, we use representations (12) and (13)  (20) For the variances of the estimators, we have the following lemma.  (16) is right continuous at t. Let K be a continuously differentiable symmetric probability density with support [−1, 1]. Then, as b n ↓ 0, This result holds for the squared radius as well as the volume.
Proof Considering the volume, by (20), nb 2 n Var( ν vol n (t)) equals Using the asymptotic bias (19) and Condition 4.1, it follows that the second term in the above expression is o(1) for n → ∞. Because K (±1) = K (±1) = 0 for the kernel function we consider, for v < −1 implying that for v < −1 This bound, with boundedness ofK , imply thatK is square integrable. Now note that where we use dominated convergence and right continuity of τ q at t.
As the global behavior of the density estimator as a function is maybe even more of interest than its local behavior (more so than for the estimator of the distribution function), the mean integrated squared error, is also interesting for ν n . We have the following result.
for the squared radius and volume. If ν has a uniformly bounded third derivative and f bounded support in [0, ∞) 2 , then Proof The asymptotics of the MSE immediately follows from (19) and Lemma 4.2.
For the MISE, note that bounded support of f implies bounded support of g which in turn implies bounded support of τ q and ν.
From Theorem 4.3, we infer that the asymptotic M(I)SE-optimal bandwidth corresponds to a balance of the two terms, leading to b n ∼ n −1/6 . Taking b n = αn −1/6 , the asymptotically MISE-optimal choice for α is given by Taking the asymptotically optimal bandwidth leads to Unlike the asymptotically MSE-optimal bandwidth for estimating N (t), which is dependent only on the sample size, finding the MISE-optimal bandwidth for estimating the pdf must be done more carefully. This bandwidth also depends on the second derivative of the function being estimated, as well as on integrals related to the kernel.
For the kernel-dependent constants in (22) based on the biweight kernel, Details on the latter are given in the appendix. Furthermore, note that for the squared radius (to which we restrict ourselves for the moment) so that considering the squared radius and using the biweight kernel, the asymptotically MISE optimal bandwidth is given by n −1/6 ≈ 280σ X σ H n −1/6 ≈ 5.2Z nHn n −1/6 .
5 Application to a steel microstructure Figure 4 shows the binary image of a steel microstructure obtained at TATA Steel. The grey rectangles correspond to the bounding boxes of the features of interest and are used as the observed rectangles on the cut plane within the oriented cylinder model. Bounding boxes were chosen because they are well-defined rectangular objects around the interesting structures in the image, which themselves are certainly not perfectly rectangular. From these bounding boxes, the pairs (Z i , H i ) are distilled. As such, a total of 175 pairs are obtained, of which 4 are ignored because the corresponding Zvalues are way out of range. Figure 5 shows the scatter plot of the data obtained. For a complete discussion on the choice and effects of the bounding box on the estimation results, see Sect. 5.4 in McGarrity (2013).  Figure 7 shows the empirical plug-in estimator N n defined in (7) as well as the kernel estimatorÑ n for N based on the microstructure data. In Fig. 8, the isotonic estimators as well as the kernel estimators for the distribution functions F of the squared radius and the volume are given. For these estimates, we use relation (4), taking 1 n as (consistent) estimator of N (0). From these pictures, it is clear that the isotonic and kernel smoothed estimators are quite close.
As indicated before, special interest is in the estimates of the functions ν for both the squared radii and the volumes. For the squared radii, Z n ≈ 99.7, leading via (25) to bandwidth choice b n = 2.4Z n n −1/6 ≈ 102.
The left panel of Fig. 9 shows the resulting kernel estimate for the probability density f (related to ν via (6), also using (29) as estimator for N (0)).
For the volume density, the reference family with Gamma densities requires an a priori choice for the respective shape parameters α and β. As the independence between H and Z in the reference family implies that the observed H -values could be viewed as sample from the Gamma distribution of interest, we computed the MLE based on the observed values, resulting in β ≈ 7. See Fig. 6, showing a surprisingly good Gamma fit to the observed heights. For the squared radii, we more arbitrarily choose a Gamma density that guarantees convergence of the needed integrals, α = 11. Based on the measured data (H n = 9.7,Z n = 99.7), we obtain via (27) σ H =H β ≈ 1.4 andσ X = 3Z 2α + 1 ≈ 13, leading via (28) to bandwidth choice b n = 5.2Z nHn n −1/6 ≈ 2130. Figure 9 shows the resulting kernel estimate of the probability density of the volume.

Discussion
We consider estimation of distributions within the oriented cylinder model. This is an extension of the classical Wicksell model. The estimators considered up till now in the oriented cylinder model do not include estimators that can be differentiated to obtain estimators of the probability densities. In this paper, we introduce smooth estimators that can be used in that way. We restrict attention to the estimation of the distribution of the (squared) radius of the cylinder base and the volume of the cylinders. Other aspects can also be of interest and studied in the same way based on relation (2). For example, the distributions of the aspect ratio R = √ X /H and the surface area S = 2π(X + √ X H). The corresponding functions q are given by surface area: T = 2π(X + √ X H).
Besides the ones considered in this paper, there are other natural choices to estimate N . One would be to use the isotonic estimator of Groeneboom and Jongbloed (1995) as initial estimator and smoothing this. The other is to isotonize the estimatorÑ n . The asymptotic theory for the first type of estimator will be much harder to develop. Moreover, the conjecture is that the resulting estimators will not be better asymptotically than N n . See Groeneboom et al. (2010) for a study of smoothed isotonic estimators in the current status model, where such 'smoothed isotonic' and 'isotonized smooth' estimators are studied and shown to have similar asymptotic behavior.