Skip to main content
Log in

Nearest Neighbor Sampling of Point Sets Using Rays

  • Original Paper
  • Published:
Communications on Applied Mathematics and Computation Aims and scope Submit manuscript

Abstract

We propose a new framework for the sampling, compression, and analysis of distributions of point sets and other geometric objects embedded in Euclidean spaces. Our approach involves constructing a tensor called the RaySense sketch, which captures nearest neighbors from the underlying geometry of points along a set of rays. We explore various operations that can be performed on the RaySense sketch, leading to different properties and potential applications. Statistical information about the data set can be extracted from the sketch, independent of the ray set. Line integrals on point sets can be efficiently computed using the sketch. We also present several examples illustrating applications of the proposed strategy in practical scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data Availability

All the datasets used in this paper are well-known public datasets, and they are available through a simple search.

Notes

  1. RaySense does not require point clouds for inputs: we could apply RaySense directly to surface meshes, implicit surfaces, or even—given an fast nearest neighbor calculator—the CAD models directly.

References

  1. Atzmon, M., Maron, H., Lipman, Y.: Point convolutional neural networks by extension operators. arXiv:1803.10091 (2018)

  2. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)

    Article  MATH  Google Scholar 

  3. Besl, P.J., McKay, N.D.: Method for registration of 3-D shapes. In: Sensor Fusion IV: Control Paradigms and Data Structures, vol. 1611 (1992)

  4. Caflisch, R.E.: Monte Carlo and quasi-Monte Carlo methods. Acta Numer. 7, 1–49 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  5. Chang, A.X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J.X., Yi, L., Yu, F.: ShapNet: an information-rich 3D model repository. arXiv:1512.03012 (2015)

  6. Corless, R.M., Gonnet, G.H., Hare, D.E., Jeffrey, D.J., Knuth, D.E.: On the LambertW function. Adv. Comput. Math. 5(1), 329–359 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  7. Curless, B., Levoy, M.: A volumetric method for building complex models from range images computer graphics. In: SIGGRAPH 1996 Proceedings (1996)

  8. Cutler, A., Breiman, L.: Archetypal analysis. Technometrics 36(4), 338–347 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  9. Daley, D.J., Vere-Jones, D.: An Introduction to the Theory of Point Processes. Volume I: Elementary Theory and Methods. Springer, New York (2003)

    MATH  Google Scholar 

  10. De Bruijn, N.G.: Asymptotic Methods in Analysis, vol. 4. Courier Corporation, USA (1981)

    MATH  Google Scholar 

  11. Doerr, B.: Probabilistic tools for the analysis of randomized optimization heuristics. In: Theory of Evolutionary Computation, pp. 1–87. Springer, Cham, Switzerland (2020)

    Chapter  Google Scholar 

  12. Draug, C., Gimpel, H., Kalma, A.: The Octave Image package (version 2.14.0) (2022). https://gnu-octave.github.io/packages/image

  13. Dvoretzky, A., Kiefer, J., Wolfowitz, J.: Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. Ann. Math. Stat. 27(3), 642–669 (1956)

    Article  MathSciNet  MATH  Google Scholar 

  14. Eldar, Y., Lindenbaum, M., Porat, M., Zeevi, Y.Y.: The farthest point strategy for progressive image sampling. IEEE Trans. Image Process. 6(9), 1305–1315 (1997)

    Article  Google Scholar 

  15. Fang, Y., Xie, J., Dai, G., Wang, M., Zhu, F., Xu, T., Wong, E.: 3D deep shape descriptor. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2319–2328 (2015)

  16. Graham, R., Oberman, A.M.: Approximate convex hulls: sketching the convex hull using curvature. arXiv:1703.01350 (2017)

  17. Hadwiger, H.: Vorlesungen Über Inhalt, Oberfläche und Isoperimetrie, vol. 93. Springer, Berlin (1957)

    Book  MATH  Google Scholar 

  18. Helgason, S., Helgason, S.: The Radon Transform, vol. 2. Springer, New York (1980)

    Book  MATH  Google Scholar 

  19. Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pp. 604–613 (1998)

  20. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167 (2015)

  21. Jiménez, P., Thomas, F., Torras, C.: 3D collision detection: a survey. Comput. Gr. 25(2), 269–285 (2001)

    Article  Google Scholar 

  22. Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. IEEE Trans. Big Data 7(3), 535–547 (2019)

    Article  Google Scholar 

  23. Jones, P.W., Osipov, A., Rokhlin, V.: A randomized approximate nearest neighbors algorithm. Applied and Computational Harmonic Analysis 34(3), 415–444 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  24. Kazmi, I.K., You, L., Zhang, J.J.: A survey of 2D and 3D shape descriptors. In: 2013 10th International Conference Computer Graphics, Imaging and Visualization, pp. 1–10. IEEE (2013)

    Google Scholar 

  25. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980 (2014)

  26. Klain, D.A., Rota, G.-C.: Introduction to Geometric Probability. Cambridge University Press, Cambridge (1997)

    MATH  Google Scholar 

  27. Klokov, R., Lempitsky, V.: Escape from cells: deep KD-networks for the recognition of 3D point cloud models. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 863–872 (2017)

  28. Krig, S.: Interest point detector and feature descriptor survey. In: Computer Vision Metrics, pp. 187–246. Springer, Cham (2016)

    Chapter  MATH  Google Scholar 

  29. LeCun, Y.: The MNIST database of handwritten digits (1998). http://yann.lecun.com/exdb/mnist/

  30. Li, J.X., Chen, B.M., Lee, H.: SO-Net: self-organizing network for point cloud analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9397–9406 (2018)

  31. Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: PointCNN: convolution on X-transformed points. In: Advances in Neural Information Processing Systems, pp. 820–830 (2018)

  32. Lin, M., Gottschalk, S.: Collision detection between geometric models: a survey. Proc. IMA Conf. Math. Surf. 1, 602–608 (1998)

    MATH  Google Scholar 

  33. Macdonald, C.B., Merriman, B., Ruuth, S.J.: Simple computation of reaction-diffusion processes on point clouds. Proc. Natl. Acad. Sci. 110(23), 9209–9214 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  34. Macdonald, C.B., Miller, M., Vong, A., et al.: The Octave Symbolic package (version 3.0.1) (2022). https://gnu-octave.github.io/packages/symbolic

  35. Mahalanobis, P.C.: On the generalised distance in statistics. Proc. Natl. Inst. Sci. India 2, 49–55 (1936)

    MATH  Google Scholar 

  36. Massart, P.: The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. Ann. Prob. 1990, 1269–1283 (1990)

    MathSciNet  MATH  Google Scholar 

  37. Meurer, A., Smith, C.P., Paprocki, M., Čertík, O., Kirpichev, S.B., Rocklin, M., Kumar, A., Ivanov, S., Moore, J.K., Singh, S., Rathnayake, T., Vig, S., Granger, B.E., Muller, R.P., Bonazzi, F., Gupta, H., Vats, S., Johansson, F., Pedregosa, F., Curry, M.J., Terrel, A.R., Roučka, Š., Saboo, A., Fernando, I., Kulal, S., Cimrman, R., Scopatz, A.: SymPy: symbolic computing in Python. PeerJ Comput. Sci. 3, e103 (2017). https://doi.org/10.7717/peerj-cs.103

    Article  Google Scholar 

  38. Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)

  39. Natterer, F.: The Mathematics of Computerized Tomography. Society for Industrial and Applied Mathematics, USA (2001)

    Book  MATH  Google Scholar 

  40. Osting, B., Wang, D., Xu, Y., Zosso, D.: Consistency of archetypal analysis. SIAM J. Math. Data Sci. 3(1), 1–30 (2021).

    Article  MathSciNet  MATH  Google Scholar 

  41. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in PyTorch. In: NIPS Autodiff Workshop (2017)

  42. Peyré, G., Cuturi, M., et al.: Computational optimal transport: with applications to data science. Found. Trends® Mach. Learn. 11(5–6), 355–607 (2019)

    Article  MATH  Google Scholar 

  43. Pollard, D.: Convergence of Stochastic Processes. Springer, New York (1984)

    Book  MATH  Google Scholar 

  44. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)

  45. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)

  46. Radon, J.: über die bestimmung von funktionen durch ihre integralwerte längs gewisser mannigfaltigkeiten. Class. Pap. Mod. Diagn. Radiol. 5(21), 124 (2005)

    Google Scholar 

  47. Rostami, R., Bashiri, F.S., Rostami, B., Yu, Z.: A survey on data-driven 3D shape descriptors. Comput. Graph. Forum 38(1), 356–393 (2019)

    Article  Google Scholar 

  48. Ruuth, S.J., Merriman, B.: A simple embedding method for solving partial differential equations on surfaces. J. Comput. Phys. 227(3), 1943–1961 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  49. Santaló, L.A.: Integral Geometry and Geometric Probability. Cambridge University Press, New York (2004)

    Book  MATH  Google Scholar 

  50. Sedaghat, N., Zolfaghari, M., Amiri, E., Brox, T.: Orientation-boosted voxel nets for 3D object recognition. arXiv:1604.03351 (2016)

  51. Settles, B.: Active learning literature survey. Technical report, University of Wisconsin-Madison Department of Computer Sciences (2009)

  52. Shen, Y., Feng, C., Yang, Y., Tian, D.: Mining point cloud local structures by kernel correlation and graph pooling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

  53. Simonovsky, M., Komodakis, N.: Dynamic edge-conditioned filters in convolutional neural networks on graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3693–3702 (2017)

  54. Sinha, A., Bai, J., Ramani, K.: Deep learning 3D shape surfaces using geometry images. In: European Conference on Computer Vision. Springer, Berlin (2016)

    Google Scholar 

  55. Solmon, D.C.: The X-ray transform. J. Math. Anal. Appl. 56(1), 61–83 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  56. The mpmath development team: mpmath: a Python library for arbitrary-precision floating-point arithmetic (version 1.2.1). (2021). https://mpmath.org/

  57. Trefethen, L.N., Weideman, J.A.C.: The exponentially convergent trapezoidal rule. SIAM Rev. 56(3), 385–458 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  58. Tsai, Y.-H.R.: Rapid and accurate computation of the distance function using grids. J. Comput. Phys. 178(1), 175–195 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  59. Villani, C.: Optimal Transport: Old and New, vol. 338. Springer, Berlin (2009)

    MATH  Google Scholar 

  60. Wang, P.-S., Liu, Y., Guo, Y.-X., Sun, C.-Y., Tong, X.: O-CNN: Octree-based convolutional neural networks for 3D shape analysis. ACM Trans. Gr. (TOG) 36(4), 72 (2017)

    Google Scholar 

  61. Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Gr. (TOG) 38(5), 146 (2019)

    Google Scholar 

  62. Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D shapenets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1912–1920 (2015)

  63. Xia, F., et al.: PointNet.pytorch Git repository. https://github.com/fxia22/pointnet.pytorch

  64. Xie, J., Dai, G., Zhu, F., Wong, E.K., Fang, Y.: Deepshape: deep-learned shape descriptor for 3D shape retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1335–1345 (2016)

    Article  Google Scholar 

  65. Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4490–4499 (2018)

Download references

Acknowledgements

Part of this research was performed while Macdonald and Tsai were visiting the Institute for Pure and Applied Mathematics (IPAM), which is supported by the National Science Foundation (Grant No. DMS-1440415). This work was partially supported by a grant from the Simons Foundation, NSF Grants DMS-1720171 and DMS-2110895, and a Discovery Grant from Natural Sciences and Engineering Research Council of Canada. The authors thank the Texas Advanced Computing Center (TACC) and UBC Math Dept Cloud Computing for providing computing resources.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liangchen Liu.

Ethics declarations

Conflict of Interest

The authors have no competing interests to declare that are relevant to the content of this article.

Appendices

Appendix A Examples of Ray Distributions

We assume all points are properly calibrated by a common preprocessing step. This could also be learned. In fact, one can use RaySense to train such a preprocessor to register the dataset, for example, using Sect. 4.4 or similar. However, for simplicity, in our experiments, we generally normalize each point set to be in the unit \(\ell ^2\) ball, with a center of mass at the origin, unless otherwise indicated.

We present two ways to generate random rays. There is no right way to generate rays, although it is conceivable that one may find optimal ray distributions for specific applications.

Method R1 One simple approach is generating rays of the fixed-length L, whose direction \(\varvec{v}\) is uniformly sampled from the unit sphere. We add a shift \(\varvec{b}\) sampled uniformly from \([-\frac{1}{2}, \frac{1}{2}]^d\) to avoid a bias for the origin. The \(n_r\) sample points are distributed evenly along the ray:

$$\begin{aligned} \varvec{r}_i = \varvec{b} + L \left( \frac{i}{n_r-1} - \frac{1}{2} \right) \varvec{v} , \qquad i = 0, \cdots , n_r-1. \end{aligned}$$

The spacing between adjacent points on each ray is denoted by \(\delta r\), which is \(\frac{L}{n_r-1}\). We use \(L=2\).

Method R2 Another natural way to generate random rays is by random endpoints selection: choose two random points \(\varvec{p}\), \(\varvec{q}\) on a sphere and connect them to form a ray. Then we evenly sample \(n_r\) points between \(\varvec{p}\), \(\varvec{q}\) on the ray. To avoid overly short rays where information would be redundant, we use a minimum ray-length threshold \(\tau\) to discard rays. Note that the distance between \(n_r\) sample points is different on different rays:

$$\begin{aligned} \varvec{r}_i = \varvec{p} + \frac{i}{n_r-1} (\varvec{q}-\varvec{p}), \qquad i=0,\cdots ,n_r-1. \end{aligned}$$

The spacing of points on each ray varies, depending on the length of the ray.

Figure 13 shows the density of rays from the ray generation methods. In this paper, we use Method R1; a fixed \(\delta r\) seems to help maintain spatial consistency along the rays, which increases RayNN’s classification accuracy in Sect. 4.5.

Fig. 13
figure 13

Density of rays from method R1 (left) and R2 (right). Red circle indicates the \(\ell ^2\) ball

Appendix B Implementation Details of RayNN

Our implementation uses PyTorch [41].

Architecture RayNN takes the \(m\times k \times c\) RaySense sketch tensor \({\mathcal {S}}(\varGamma )\) as input, and outputs a K-vector of probabilities, where K is the number of object classes.

The first few layers of the network are blocks of 1D convolution followed by max-pooling to encode the sketch into a single vector per ray. Convolution and max-pooling are applied along the ray. After this downsizing, we implemented a max operation across rays. Figure 14 includes some details. The output of the max pooling layer is fed into fully connected layers with output sizes 256, 64, and K to produce the desired vector of probabilities \(\varvec{{\textbf{p}}}_i \in {\mathbb {R}}^K\). Batchnorm [20] along with ReLU [38] is used for every fully-connected and convolution layer.

Note that our network uses convolution along rays to capture local information while the fully connected layers aggregate global information. Between the two, the max operation across rays ensures invariance to the ordering of the rays. It also allows for an arbitrary number of rays to be used during inference. These invariance properties are similar to PointNet’s input-order invariance [44].

Fig. 14
figure 14

The RayNN architecture for m rays and \(n_r\) samples per ray. The input is c feature matrices from \({\mathcal {S}}(\varGamma )\) with suitable operations. With \(n_r= 16\), each matrix is downsized to an m-vector by 4 layers of 1D convolution and max-pooling. The max operator is then applied to each of the 1 024 m-vectors. The length-1 024 feature vector is fed into a multi-layer perceptron (mlp) which outputs a vector of probabilities, one for each of the K classes in the classification task. Note the number of intermediate layers (blue) can be increased based on \(n_r\) and c

Data We apply RayNN on the standard ModelNet10 and ModelNet40 benchmarks [62] for 3D object classification. ModelNet40 consists of 12 311 orientation-aligned [50] meshed 3D CAD models, divided into 9 843 training and 2 468 test objects. ModelNet10 contains 3 991 training and 908 test objects. Following the experiment setup in [44], we sample \(N=1 \;024\) points from each of these models and rescale them to be bounded by the unit sphere to form point sets.Footnote 1 Our results do not appear to be sensitive to N.

Training During training, we use dropout with ratio 0.5 on the penultimate fully-connected layer. We also augment our training dataset on-the-fly by adding \({\mathcal {N}}(0,0.000\; 4)\) noise to the coordinates. For the optimizer, we use Adam [25] with momentum 0.9 and batch size 16. The learning rate starts at 0.002 and is halved every 100 epochs.

Inference Our algorithm uses random rays, so it is natural to consider strategies to reduce the variance in the prediction. We consider one simple approach during inference by making an ensemble of predictions from \(\lambda\) different ray sets. The ensemble prediction is based on the average over the \(\lambda\) different probability vectors \({{\textbf{p}}}_i \in {\mathbb {R}}^K\), i.e.,

$$\begin{aligned} \text {Prediction}(\lambda ) = \frac{1}{\lambda } \sum _{i=1}^\lambda {{\textbf{p}}}_i. \end{aligned}$$

The assigned label then corresponds to the entry with the largest probability. We denote the number of rays used during training by m, while the number of rays used for inference is \({\hat{m}}\). Unless otherwise specified, we use \(\lambda =8\), \(m=32\) rays, and \({\hat{m}}=m\).

Appendix C Details of the Proof of Theorem 1

This appendix contains the proofs of Lemmas 1, 2, and 3, and Theorem 1.

Proof of Lemma 1

The probability measure of \(\varOmega _j\) is

$$\begin{aligned} P_{\varOmega _j} = \int _{\varvec{x}\in \varOmega _j} \rho (\varvec{x}) \textrm{d}\varvec{x} > 0, \end{aligned}$$

which represents the probability of sampling \(\varOmega _j\) when drawing i.i.d. random samples from \(\mu\). For a fixed set of such hypercubes, any \(\varvec{x}\in \text {supp}(\rho )\) will fall in one of the \(\varOmega _j\)’s. Then one can define a mapping \(h{:}\text {supp}(\rho )\subset {\mathbb {R}}^d\rightarrow {\mathbb {R}}\) by

$$\begin{aligned}s = h(\varvec{x}) = j-1, \quad \text {where } \varvec{x}\in \varOmega _j, \; j = \,1,\,2,\,\cdots ,\,M.\end{aligned}$$

By applying the mapping to the random vector \(\varvec{X}\), we obtain a new discrete random variable S with the discrete probability distribution \(\mu _M\) on \({\mathbb {R}}\) and the corresponding density \(\rho _M\). The random variable S lives in a discrete space \(S \in \{0,\,1,\,\cdots ,\,M-1\}\) and \(\rho _M\) is given as a sum of delta spikes as

$$\begin{aligned} \rho _M(s) = \sum _{j=1}^M P_{\varOmega _j}\delta _j(s).\end{aligned}$$

As a result, sampling from the distribution \(\mu _M\) is equivalent to sampling the hypercubes according to the distribution \(\mu\) in \({\mathbb {R}}^d\), but one cares only about the sample being in a specific hypercube \(\varOmega _j\), not the precise location of the sample. Let \(F_M(s)\) denote the cumulative density function related to the density function \(\rho _M(s)\).

Now, given a set of N independent samples of \(\varvec{X}{:}\) \(\{\varvec{X}_i\}_{i=1}^N\subset {\mathbb {R}}^d\), we have a corresponding set of N independent sample points of S: \(\{s_i\}_{i=1}^N\) such that \(\varvec{X}_i\in \varOmega _{s_i+1}\). From there, we can regard the histogram of \(\{s_i\}_{i=1}^N\) as an empirical density of the true density \(\rho _M\). Denote the empirical density by \({\tilde{\rho }}_M^N\) which is given by

$$\begin{aligned} {\tilde{\rho }}_M^N = \frac{1}{N}\sum _{i=1}^{N} \delta _{s_i}.\end{aligned}$$

One can therefore also obtain an empirical cumulative density function \({\tilde{F}}^N_M(s)\) using the indicator function \(\chi\):

$$\begin{aligned} {\tilde{F}}^N_M(s) = \frac{1}{N} \sum _{i=1}^{N} \chi _{\{s_i\leqslant s\}}.\end{aligned}$$

By Dvoretzky-Kiefer-Wolfowitz inequality [13, 36] we have

$$\begin{aligned} Prob \left (\sup _{s\in {\mathbb {R}}}\big|F_M(s)-{\tilde{F}}^N_M(s) \big|>\varepsilon \right )\leqslant 2\text{e}^{-2N\varepsilon ^2} \quad \text { for all }\varepsilon >0.\end{aligned}$$

Therefore, for a desired fixed probability \(p_0\), the above indicates the approximating error given by the empirical \({\tilde{F}}^N_M(s)\) is at most

$$\begin{aligned} \sup _{s\in {\mathbb {R}}}\big|F_M(s)-{\tilde{F}}^N_M(s) \big| \leqslant \varepsilon _N = \bigg (-\frac{1}{2N}\ln {\bigg (\frac{1-p_0}{2}\bigg )}\bigg )^\frac{1}{2}\end{aligned}$$

with probability at least \(p_0\). Then note that the true probability measure \(P_{\varOmega _j}\) of \(\varOmega _j\) being sampled by random drawings from \(\mu\) is equivalent to the true probability of \(j-1\) being drawn from \(\mu _M\), i.e.,

$$\begin{aligned} P_{\varOmega _j} = P_M(j-1):= \rho _M(j-1),\end{aligned}$$

therefore, \(P_{\varOmega _j} = P_M(j-1)\) can be computed from \(F_M\) by

$$\begin{aligned} P_{\varOmega _j}&= F_M(j-1)-F_M(j-2) \\&= F_M(j-1)-{\tilde{F}}^N_M(j-1)+{\tilde{F}}^N_M(j-1)-{\tilde{F}}^N_M(j-2)+{\tilde{F}}^N_M(j-2)-F_M(j-2). \end{aligned}$$

Taking absolute value and using the triangle inequality, with the fixed \(p_0\)

$$\begin{aligned} P_{\varOmega _j} \leqslant 2\varepsilon _N + {\tilde{P}}^N_M(j-1), \end{aligned}$$

where \({\tilde{P}}^N_M(j-1)\) denotes the empirical probability at \(j-1\). Applying the same argument to \({\tilde{P}}^N_M(j-1),\) one has

$$\begin{aligned} |P_{\varOmega _j} - {\tilde{P}}^N_M(j-1) | \leqslant 2\varepsilon _N \quad \text {for all }j=1,\,2,\,\cdots ,\,M.\end{aligned}$$

For a set of N sample points, \({\tilde{P}}^N_M(j-1)\) is computed by \( \frac{N_j}{N}\), where \(N_j\) is the number of times \(j-1\) got sampled by \(\{s_i\}_{i=1}^N\), or equivalently \(\varOmega _j\) got sampled by \(\{\varvec{X}_i\}_{i=1}^N\), which indicates that in practice, with probability at least \(p_0\), the number of sampling points \(N_j\) in \(\varOmega _j\) satisfies the following bound:

$$\begin{aligned} P_{\varOmega _j} - 2\varepsilon _N\leqslant \frac{1}{N}N_j \leqslant P_{\varOmega _j} + 2\varepsilon _N \implies P_{\varOmega _j}N - 2\varepsilon _N N\leqslant N_j \leqslant P_{\varOmega _j}N + 2\varepsilon _N N.\end{aligned}$$

By taking N large enough such that \(P_{\varOmega _j}N -2\varepsilon _N N = 1 \implies N_j \geqslant 1\):

$$\begin{aligned}\implies N = \frac{\sqrt{\ln {\left(\frac{1-p_0}{2}\right)}\bigg (\ln {\left(\frac{1-p_0}{2}\right)}-2P_{\varOmega _j}\bigg )}+P_{\varOmega _j} - \ln {\left(\frac{1-p_0}{2}\right)}}{P_{\varOmega _j}^2}.\end{aligned}$$

The above quantity is clearly a function with respect to the probability measure \(P_{\varOmega _j}\), and any \(\varOmega _i\) with \(P_{\varOmega _i}\geqslant P_{\varOmega _j}\) would have \(N_i\geqslant N_j\geqslant 1\). Using \(\nu\) to denote such a function and \(0<P\leqslant 1\) as the threshold measure completes the first part of the proof:

$$\begin{aligned}\implies \nu \big (P\big ) = \frac{\sqrt{\ln {\left(\frac{2}{1-p_0}\right)}\bigg (\ln {\left(\frac{2}{1-p_0}\right)}+2P\bigg )}+P + \ln {\left(\frac{2}{1-p_0}\right)}}{P^2}.\end{aligned}$$

To establish the bounds on the expression, we note

$$\begin{aligned} \nu (P)&> \frac{\sqrt{\ln {\left(\frac{2}{1-p_0}\right)} \ln {\left(\frac{2}{1-p_0}\right)}} + P + \ln {\left(\frac{2}{1-p_0}\right)}}{P^2}=\frac{2\ln {\left(\frac{2}{1-p_0}\right)}+P}{P^2}, \\ \nu (P)&< \frac{\sqrt{\bigg (\ln {\left(\frac{2}{1-p_0}\right)}+2P\bigg )^2} + P + \ln {\left(\frac{2}{1-p_0}\right)}}{P^2}=\frac{2\ln {\left(\frac{2}{1-p_0}\right)}+3P}{P^2}. \end{aligned}$$

Proof of Lemma 2

Consider a local hypercube centered at \(\varvec{y}\), \(\varOmega _{\varvec{y}}:= \{\varvec{x}+\varvec{y}\in {\mathbb {R}}^d{:} \Vert \varvec{x}\Vert _\infty =\frac{l}{2}\}\) of length l to be determined. We shall just say “cube”. The probability of cube \(\varOmega _{\varvec{y}}\) being sampled is given by \(P_{\varOmega _{\varvec{y}}} = \int _{\varOmega _{\varvec{y}}} \rho (\varvec{x})\textrm{d}\varvec{x}\). Now for the set of standard basis vector \(\{\varvec{e}_i\}_{i=1}^d\), let \(\varvec{v}_d\) denote the sum of all the basis: \(\varvec{v}_d:=\sum _{i=1}^d \varvec{e}_i\). Without loss of generality, the probability of a diagonal cube, defined by \(\varOmega _{\varvec{y}_d}:=\{\varvec{x}+\varvec{y}+\varvec{v}_d \in {\mathbb {R}}^d{:} \Vert \varvec{x}\Vert _\infty =\frac{l}{2}\}\), being sampled (unconditional to \(\varOmega _{\varvec{y}}\) being sampled) has the following bound by Lipschitz continuity of \(\rho\):

$$\begin{aligned} \big|P_{\varOmega _{\varvec{y}_d}} - P_{\varOmega _{\varvec{y}}} \big|\leqslant \int _{\varOmega _{\varvec{y}}} |\rho (\varvec{x}+l \varvec{v}_d) - \rho (\varvec{x}) |\textrm{d}\varvec{x} \leqslant L\sqrt{d}\,l|\varOmega _{\varvec{y}} | \implies P_{\varOmega _{\varvec{y}_d}} \geqslant P_{\varOmega _{\varvec{y}}} - L\sqrt{d}\,l^{d+1}. \end{aligned}$$

Furthermore, \(P_{\varOmega _{\varvec{y}}}\) has the following lower bound also by Lipschitz continuity of \(\rho\). For any \(x \in \varOmega _{\varvec{y}}\), we have

$$\begin{aligned} |\rho (\varvec{x}) - \rho (\varvec{y}) | \leqslant L \sqrt{d} \frac{l}{2} \implies \rho (\varvec{x})\geqslant \rho (\varvec{y})-L\frac{\sqrt{d}}{2}l \implies P_{\varOmega _{\varvec{y}}}\geqslant \left( \rho (\varvec{y})-L\frac{\sqrt{d}}{2}l\right) l^d. \end{aligned}$$
(21)

Combining with the previous bound for \(P_{\varOmega _{\varvec{y}_d}}\), we further have

$$\begin{aligned} P_{\varOmega _{\varvec{y}_d}} \geqslant \left( \rho (\varvec{y})-L\frac{\sqrt{d}}{2}l\right) l^d - L\sqrt{d}l^{d+1} = \rho (\varvec{y})l^d-\frac{3\sqrt{d}}{2}Ll^{d+1}. \end{aligned}$$

By setting \(\rho (\varvec{y}) > \frac{3\sqrt{d}}{2}Ll\) we can ensure \(P_{\varOmega _{\varvec{y}_d}} >0\), but this extreme lower bound is based on on Lipschitz continuity. To obtain a more useful bound, we will show below that by picking \(l:= l_N\) judiciously, \(\rho (\varvec{y})> 3\sqrt{d}Ll_N>0\), any surrounding cube has non-zero probability to be sampled. Therefore, with \(\rho (\varvec{y}) > 3\sqrt{d}Ll_N\), for any diagonal cube \(\varOmega _{y_d}\):

$$\begin{aligned} \rho (\varvec{y})l^d - \frac{1}{2}\rho (\varvec{y})l^d> \frac{3\sqrt{d}}{2}Ll^{d+1} \implies P_{\varOmega _{\varvec{y}_d}} > \frac{1}{2}\rho (\varvec{y})l_N^d.\end{aligned}$$

Since the diagonal cube is the furthest to \(\varvec{y}\) among all the surrounding cubes, we have for every surrounding cube of \(\varOmega _{\varvec{y}}\), their probability measure is at least \(P_{\varOmega _{\varvec{y}_d}}.\)

According to Lemma 1, for N sampling points, with probability at least \(p_0\), if a region has probability measure \(\geqslant P_N\), then there is at least one point sampled in that region, where \(P_N\) is the threshold probability depending on N obtained by solving the equation below:

$$\begin{aligned} N = \frac{\sqrt{\ln {\bigg (\frac{2}{1-p_0}\bigg )}\bigg (\ln {\bigg (\frac{2}{1-p_0}\bigg )}+2P_N\bigg )}+P_N + \ln {\bigg (\frac{2}{1-p_0}\bigg )}}{(P_N)^2}.\end{aligned}$$

By the bounds for N in (17) of Lemma 1, we know there is some constant \(c\in (1,3)\) s.t.:

$$\begin{aligned} N = \frac{2\ln {\bigg (\frac{2}{1-p_0}\bigg )}+cP_N}{P_N^2} \implies NP_N^2 - cP_N -2\ln {\bigg (\frac{2}{1-p_0}\bigg )} =0. \end{aligned}$$

Solving the above quadratic equation and realize that \(P_N>0\), we have

$$\begin{aligned}P_N = \frac{c+\bigg (c^2+8N\ln {\bigg (\frac{2}{1-p_0}\bigg )}\bigg )^{\frac{1}{2}}}{2N}.\end{aligned}$$

Therefore, for a fixed N, by requiring

$$\begin{aligned}P_{\varOmega _{\varvec{y}_d}}>\frac{\rho (\varvec{y})}{2}l_N^{d} \geqslant P_N \implies l_N \geqslant \left( \frac{c+\bigg (c^2+8N\ln {\bigg (\frac{2}{1-p_0}\bigg )}\bigg )^{\frac{1}{2}}}{\rho (\varvec{y})N}\right) ^{\frac{1}{d}},\end{aligned}$$

we have with probability \(p_0\) that at every surrounding cube of \(\varOmega _{\varvec{y}}\) of side length \(l_N\), there is at least one point. This lower bound for \(l_N\) ensures the surrounding cube has enough probability measure to be sampled. Since \(1<c<3\), we can just take \(l_N\) to be

$$\begin{aligned}l_N:= \left( \frac{3+\bigg (9+8N\ln {\left(\frac{2}{1-p_0}\right)}\bigg )^{\frac{1}{2}}}{\rho (\varvec{y})N}\right) ^{\frac{1}{d}}>\left( \frac{c+\bigg (c^2+8N\ln {\left(\frac{2}{1-p_0}\right)}\bigg )^{\frac{1}{2}}}{\rho (\varvec{y})N}\right) ^{\frac{1}{d}}. \end{aligned}$$

From above we see that for a fixed \(\rho (\varvec{y})\), \(l_N\) decreases as N increases. Therefore, by choosing N large enough, we can always satisfy the prescribed assumption \(\rho (\varvec{y})\geqslant 3\sqrt{d}Ll_N\).

Furthermore, when N is so large such that \(\rho (\varvec{y})\geqslant 3\sqrt{d}Ll_N\) is always satisfied, we see that \(l_N\) is a decreasing function of \(\rho\), meaning that with a higher local density \(\rho (\varvec{y})\), the \(l_N\) can be taken smaller while the sampling statement still holds, meaning the local region is more compact.

Finally, since there is a point in every surrounding cube of \(\varOmega _{\varvec{y}}\), the diameter of the Voronoi cell of \(\varvec{y}\) has the following upper-bound with the desired probability \(p_0\):

$$\begin{aligned} \text {diam}(V(\varvec{y})) \leqslant 3l\sqrt{d} = 3\sqrt{d} \left( \frac{3+\bigg (9+8N\ln {\left(\frac{2}{1-p_0}\right)}\bigg )^{\frac{1}{2}}}{\rho (\varvec{y})N}\right) ^{\frac{1}{d}}.\end{aligned}$$

Now, for a sample point \(x_0\) in the interior of \(\text {supp}(\rho )\), given a cover of cubes as in Lemma 1, \(x_0\) must belong to one of the cubes with center also denoted by \(\varvec{y}\) with a slight abuse of notation. Then note that the diameter of \(V(\varvec{x}_0)\) also has the same upper bound as shown above. To go from \(\rho (\varvec{y})\) to \(\rho (\varvec{x}_0)\), by Lipschitz continuity: \(\rho (\varvec{y})\geqslant \rho (\varvec{x}_0) - \frac{L\sqrt{d}}{2}l_N \implies \rho (\varvec{x}_0) \leqslant \rho (\varvec{y})+\frac{L\sqrt{d}}{2}l_N\). Since we require \(\rho (\varvec{y})\geqslant 3\sqrt{d}Ll_N\), we have \(\rho (\varvec{x}_0)\leqslant \frac{\rho (\varvec{y})}{6}+\rho (\varvec{y})=\frac{7}{6}\rho (\varvec{y})\). Therefore,

$$\begin{aligned}\rho (\varvec{y}) \geqslant \frac{6}{7}\rho (\varvec{x}_0) \implies \text {diam}(V(\varvec{x}_0)) \leqslant 3\sqrt{d} \left( \frac{21+7\bigg (9+8N\ln {\left(\frac{2}{1-p_0}\right)}\bigg )^{\frac{1}{2}}}{6\rho (\varvec{x})N}\right) ^{\frac{1}{d}}.\end{aligned}$$

Proof of Lemma 3

Without loss of generality, we assume that \(|\text {supp}(\rho ) |=1\), then \(\rho =1\) everywhere within its support. We partition \(\text {supp}(\rho )\) into M regions such that each region has probability measure \(\frac{1}{M}\). This partition can be constructed in the following way: for most of the interior of \(\text {supp}(\rho )\), subdivide into hypercubes \(\varOmega _j\)’s of the same size such that \(P_{\varOmega _j} = \frac{1}{M}\) and \(\varOmega _j\)’s are contained completely inside \(\text {supp}(\rho )\). Then the length of the hypercube, l, is determined by \(\frac {l^d}{|\text {supp}(\rho ) |} = \frac {1}{M} \implies l = {\bigg(\frac {1}{{M}^{\frac{1}{d}}}}\bigg)\). For the remaining uncovered regions of \(\text {supp}(\rho )\), cover with some small cubes of appropriate sizes and combine them together to obtain a region with measure \(\frac{1}{M}\).

Then, following a similar idea from Lemma 2, one has a discrete sampling problem with equal probability for each candidate, which resembles the coupon collector problem. The probability p(NdM) that each of the M region contains at least one sample point has a well-known lower bound [11]:

$$\begin{aligned}p(N,d,M) \geqslant 1-M\text{e}^{-\frac{N}{M}}.\end{aligned}$$

With the probability p(NdM) given above, for an interior hypercube we again have there is at least one sample in each of its surrounding hypercube, since now there is at least one sample in each of the M region. Then the Voronoi diameter for each point is at most \(3l\sqrt{d}\). Fixing a desired probability \(p_0\), we want to determine the number of regions M to get a control on l. We need to have a bound as follows:

$$\begin{aligned} p\geqslant 1-M\text{e}^{-\frac{N}{M}} \geqslant p_0 \implies 0< M\text{e}^{-\frac{N}{M}} \leqslant 1-p_0. \end{aligned}$$
(22)

By rearranging, the above equality holds only when

$$\begin{aligned} \frac{N}{M}\text{e}^{\frac{N}{M}} = \frac{N}{1-p_0}.\end{aligned}$$

The above equation is solvable by using the Lambert W function:

$$\begin{aligned} M = \frac{N}{W_0\big (\frac{N}{1-p_0}\big)},\end{aligned}$$

where \(W_0\) is the principal branch of the Lambert W function. Note that the Lambert W function satisfies

$$\begin{aligned} W_0(x) \text{e}^{W_0(x)} = x \implies \frac{x}{W_0(x)} = \text{e}^{W_0(x)}.\end{aligned}$$

Pluging in the above identity, one has

$$\begin{aligned} \frac{M}{1-p_0} = \frac{N}{(1-p_0)W_0\big (\frac{N}{1-p_0}\big)} \implies M = (1-p_0)\text{e}^{W_0 \left(\frac{N}{1-p_0}\right )}.\end{aligned}$$

Also note that the function \(M\text{e}^{-\frac{N}{M}}\) is monotonically increasing in M (for \(M>0\)), so for the bound in (22) to hold we require

$$\begin{aligned}M \leqslant (1-p_0)\text{e}^{W_0\left(\frac{N}{1-p_0}\right)}.\end{aligned}$$

By taking the largest possible integer M satisfying the above inequality, we then have

$$\begin{aligned} l = \left(\frac{1}{M}\right)^{\frac{1}{d}} = \bigg (\lfloor (1-p_0)\text{e}^{W_0 \left(\frac{N}{1-p_0}\right)} \rfloor \bigg)^{\frac{1}{d}}\end{aligned}$$

for every hypercube contained in \(\text {supp}(\rho )\). Then this yields a uniform bound for the Voronoi diameter of any point that is in an interior hypercube surrounded by other interior hypercubes:

$$\begin{aligned}\text {diam}(V) \leqslant 3\sqrt{d} \bigg (\lfloor (1-p_0)\text{e}^{W_0\left(\frac{N}{1-p_0}\right)} \rfloor \bigg)^{-\frac{1}{d}}. \end{aligned}$$

In terms of the limiting behavior, for large x, the Lambert W function is asymptotic to the following [6, 10]:

$$\begin{aligned} W_0(x) = \ln {x} - \ln {\ln {x}} + o(1) \implies \text{e}^{W_0(x)} = c(x) \frac{x}{\ln x}\end{aligned}$$

with \(c(x)\rightarrow 1\) as \(x\rightarrow \infty\). Therefore, for sufficiently large \(\frac {N} {\left(1-p_0 \right)}\), we have

$$\begin{aligned} \text {diam}(V) \leqslant 3\sqrt{d}\left( \left\lfloor (1-p_0)c\frac{N}{(1-p_0)\ln {\frac{N}{1-p_0}}}\right\rfloor \right) ^{-\frac{1}{d}} = 3\sqrt{d}\left( \left\lfloor \frac{1}{cN}\ln {\frac{N}{1-p_0}}\right\rfloor \right) ^{\frac{1}{d}}. \end{aligned}$$

Proof of Theorem 1

Note that when using one ray: \(S[\varGamma _1](1, j) = \varvec{x}_{1[j]}\) and \(S[\varGamma _2](1, j) = \varvec{x}_{2[j]}\) for \(j=1,2,\cdots ,n_r\). The main idea is to bound the difference between each pair of points using the results introduced in the previous lemmas. Consider a fixed sampling point \(\varvec{r}_{1,j}\in \varvec{r}(s)\) whose corresponding closest points are \(\varvec{x}_{1[j]}\) and \(\varvec{x}_{2[j]}\) in \(\varGamma _1\) and \(\varGamma _2\), respectively. We consider two cases: first when \(\varvec{r}_{1,j}\) is interior to \(\text {supp}(\rho )\), in which case from Lemmas 2 and 3, with probability \(p_0\) we have a bound for the diameter of the Voronoi cell of any interior \(\varvec{x}\), denote it by \(D(\varvec{x})\) where \(\rho ,\,p_0,\, N,\) and d are assumed to be fixed. Therefore,

$$\begin{aligned} \Vert \varvec{x}_{1[j]}-\varvec{r}_{1,j} \Vert _2 \leqslant D(\varvec{x}_{1[j]});\quad \Vert \varvec{x}_{2[j]}-\varvec{r}_{1,j} \Vert _2 \leqslant D(\varvec{x}_{2[j]}).\end{aligned}$$

Then by the triangle inequality: \(\Vert \varvec{x}_{1[j]}-\varvec{x}_{2[j]} \Vert _2 \leqslant D(\varvec{x}_{1[j]})+ D(\varvec{x}_{2[j]})\), which applies for all sampling points \(\varvec{r}_{1,j}\)’s in the interior of \(\text {supp}(\rho )\), and as \(N\rightarrow \infty\) we have \(D\rightarrow 0\) in a rate derived in Lemma 2.

In the case where sampling point \(\varvec{r}_{1,j}\in \varvec{r}(s)\) is outside of \(\text {supp}(\rho )\), since \(\text {supp}(\rho )\) is convex, the closest point to \(\varvec{r}_{1,j}\) from \(\text {supp}(\rho )\) is always unique, denoted by \(\varvec{x}_{\rho }\). Then, choose \(R_1\) depending on \(N_1,\,N_2\) such that the probability measure \(P_1 = P\big (B_{R_1}(\varvec{x}_{\rho })\cap \text {supp}(\rho ) \big )\) achieves the threshold introduced in Lemma 1 so that there is at least \(\varvec{x}_{\rho ,1}\in \varGamma _1\) and \(\varvec{x}_{\rho ,2}\in \varGamma _2\) that lies in \(B_{R_1}(\varvec{x}_{\rho })\cap \text {supp}(\rho )\). For sufficiently large N, \(\varvec{x}_{1[j]}\) and \(\varvec{x}_{2[j]}\) would be points inside \(B_{R_1}(\varvec{x}_{\rho })\cap \text {supp}(\rho )\) since \(\text {supp}(\rho )\) is convex. Then we have

$$\begin{aligned} \Vert \varvec{x}_{1[j]} - \varvec{x}_{2[j]} \Vert _2\leqslant 2R_1, \end{aligned}$$

and we can pick \(N_1, N_2\) large to make \(R_1\) as small as desired. Therefore, we have

$$\begin{aligned} \Vert \varvec{x}_{1[j]}-\varvec{x}_{2[j]} \Vert _2 \rightarrow 0 \text { as } N_1, N_2\rightarrow \infty , \end{aligned}$$

in probability for both interior sampling points and outer sampling points \(\varvec{r}_{1,j}\), and the convergence starts when \(N_1, N_2\) get sufficiently large. Consequently, for the RaySense matrices \(S[\varGamma _1]\) and \(S[\varGamma _2]\), we can always find N sufficiently large such that

$$\begin{aligned} \Vert S[\varGamma _1]-S[\varGamma _2] \Vert _F = \sqrt{\sum ^{n_r}_{i=1} \Vert \varvec{x}_{1[j]}-\varvec{x}_{2[j]} \Vert _2^2} \leqslant \varepsilon \end{aligned}$$

for arbitrarily small \(\varepsilon\) depending on N, \(n_r\), d, and the geometry of \(\text {supp}(\rho )\).

Remark 3

In case of non-convex \(\text {supp}(\rho )\) and the sampling point \(\varvec{r}_{1,j}\in \varvec{r}(s)\) is outside of \(\text {supp}(\rho )\), if the ray \(\varvec{r}\) is drawn from some distribution \({\mathcal {L}}\), with probability one, \(\varvec{r}_{1,j}\) is not equidistant to two or more points on \(\text {supp}(\rho )\), so the closest point is uniquely determined and we only need to worry about the case that \(\varvec{r}_{1,j}\) find the closest point \(\varvec{x}_{2[j]}\) from \(\varGamma _2\) that would be far away from \(\varvec{x}_{1[j]}\).

Let \(\varvec{x}_{\rho }\) be the closest point of \(\varvec{r}_{1,j}\) from \(\text {supp}(\rho )\), similarly, choose \(R_1\) depending on \(N_1,N_2\) such that for balls \(B_{R_1}(\varvec{x}_{\rho , 1})\), the probability measure \(P_1 = P\big (B_{R_1}(\varvec{x}_{\rho })\cap \text {supp}(\rho ) \big )\) achieves the threshold introduced in Lemma 1 so that there is at least \(\varvec{x}_{\rho ,1}\in \varGamma _1\) and \(\varvec{x}_{\rho ,2}\in \varGamma _2\) that lies in \(B_{R_1}(\varvec{x}_{\rho })\cap \text {supp}(\rho )\). Now, consider the case where the closest point \(\tilde{\varvec{x}}_{\rho }\) of \(\varvec{r}_{1,j}\) from the partial support \(\text {supp}(\rho )\setminus B_{R_1}(\varvec{x}_{\rho })\) is far from \(\varvec{x}_{\rho }\) due to the non-convex geometry, and denote

$$\begin{aligned} \delta = \Vert \varvec{r}_{1,j}-\tilde{\varvec{x}}_{\rho } \Vert _2 - \Vert \varvec{r}_{1,j} - \varvec{x}_{\rho } \Vert _2 >0. \end{aligned}$$

We pick N so large that

$$\begin{aligned} \Vert \varvec{r}_{1,j} - \varvec{x}_{\rho ,1} \Vert \leqslant \Vert \varvec{r}_{1,j} - \varvec{x}_{\rho } \Vert + R_1 \leqslant \Vert \varvec{r}_{1,j} - \varvec{x}_{\rho ,1} \Vert + \delta \leqslant \Vert \varvec{r}_{1,j} - \varvec{x}_{\rho ,2} \Vert , \end{aligned}$$

implying we can find \(\varvec{x}_{\rho ,1}, \varvec{x}_{\rho ,2}\) from \(\varGamma _1\) and \(\varGamma _2\) closer than \(\varvec{x}_{\rho ,2}\) from the continuum. Therefore, for sufficiently large \(N_1\) and \(N_2\), we can find both closest points \(\varvec{x}_{1[j]}\), \(\varvec{x}_{2[j]}\) of \(\varvec{r}_{1,j}\) inside \(B_{R_1}(\varvec{x}_{\rho , 1})\cap \text {supp}(\rho )\) from \(\varGamma _1\) and \(\varGamma _2\), \(\implies \Vert \varvec{x}_{1[j]}-\varvec{x}_{2[j]} \Vert _2 \leqslant 2R_1\). The rest follows identically as in the previous proof.

Appendix D Details of the Proof of Theorem 3

Before deriving the result, we first take a detour to investigate the problem under the setting of the Poisson point process, as a means of generating points in a uniform distribution.

1.1 Appendix D.1 Poisson Point Process

A Poisson point process [9] on \(\varOmega\) is a collection of random points such that the number of points \(N_{\varOmega '}\) in any bounded measurable subsets \(\varOmega _j\) with measure \(\mu (\varOmega _j)\) is a Poisson random variable with rate \(\lambda |\varOmega _j |\) such that \(N_j\sim {\text {Poi}}(\lambda |\varOmega _j |)\). In other words, we take N, instead of being fixed, to be a random Poisson variable: \(N\sim {\text {Poi}}(\lambda )\), where the rate parameter \(\lambda\) is a constant. Therefore, the underlying Poisson process is homogeneous and it also enjoys the complete independence property, i.e., the number of points in each disjoint and bounded subregion will be completely independent of all the others.

What follows naturally from these properties is that the spatial locations of points generated by the Poisson process is uniformly distributed. As a result, each realization of the homogeneous Poisson process is a uniform sampling of the underlying space with number of points \(N\sim {\text {Poi}}(\lambda )\).

Below we state a series of useful statistical properties and concentration inequality for the Poisson random variable.

  • The Poisson random variable \(N\sim {\text {Poi}}(\lambda )\) has mean and variance both \(\lambda\):

    $$\begin{aligned}{\mathbb {E}}(N) = Var(N) = \lambda .\end{aligned}$$
  • The corresponding probability density function is

    $$\begin{aligned} {\mathbb {P}}({N=k}) = \frac{\text{e}^{-\lambda }\lambda ^k}{k!}.\end{aligned}$$
  • A useful concentration inequality [43] (N scales linearly with \(\lambda\)):

    $$\begin{aligned} {\mathbb {P}}(N\leqslant \lambda -\varepsilon ) \leqslant \text{e}^{-\frac{\varepsilon ^2}{2(\lambda +\varepsilon )}}\; \text { or } \;{\mathbb {P}}(|N-\lambda |\geqslant \varepsilon ) \leqslant 2 \text{e}^{-\frac{\varepsilon ^2}{2(\lambda +\varepsilon )}}. \end{aligned}$$
    (23)

Furthermore, one can also derive a Markov-type inequality for the event a Poisson random variable \(N\sim {\text {Poi}}(\lambda )\) is larger than some \(a>\lambda\) that is independent of \(\lambda\), different from (23).

Proposition 2

For Poisson random variable \(N\sim {\text {Poi}}(\lambda )\), it satisfies the following bound for any constant \(a>\lambda\):

$$\begin{aligned} {\mathbb {P}}(N \geqslant a) \leqslant \frac{\text{e}^{(a-\lambda )}\lambda ^a}{a^a} \iff {\mathbb {P}}(N < a) \geqslant 1 - \frac{\text{e}^{(a-\lambda )}\lambda ^a}{a^a}. \end{aligned}$$
(24)

Proof

By Markov’s inequality:

$$\begin{aligned} {\mathbb {P}}(N \geqslant a)&= {\mathbb {P}}(\text{e}^{tN}\geqslant \text{e}^{ta})\leqslant \inf _{t>0}\frac{{\mathbb {E}}(\text{e}^{tN})}{\text{e}^{ta}} = \inf _{t>0}\frac{\sum _{k=1}^{\infty } \text{e}^{tk}\frac{\lambda ^k\text{e}^{-\lambda }}{k!}}{\text{e}^{ta}} \\&= \inf _{t>0}\frac{\text{e}^{-\lambda }\sum _{k=1}^{\infty } \frac{(\text{e}^{t}\lambda )^k}{k!}}{\text{e}^{ta}} = \inf _{t>0}\frac{\text{e}^{-\lambda } \text{e}^{\text{e}^t\lambda }}{\text{e}^{ta}} = \inf _{t>0}\frac{\text{e}^{(\text{e}^t-1)\lambda }}{\text{e}^{ta}}. \end{aligned}$$

To get a tighter bound, we want to minimize the R.H.S.. Let \(\zeta =\text{e}^t>1\). Then we minimize the R.H.S. over \(\zeta\):

$$\begin{aligned} \min _{\zeta>1} \frac{\text{e}^{(\zeta -1)\lambda }}{\zeta ^a} \iff \min _{\zeta >1} (\zeta -1)\lambda - a \log (\zeta ).\end{aligned}$$

A simple derivative test yields the global minimizer \(\zeta = \frac{a}{\lambda }>1\) since we require \(a>\lambda\). Thus,

$$\begin{aligned} {\mathbb {P}}(N \geqslant a) \leqslant \frac{\text{e}^{(a-\lambda )}\lambda ^a}{a^a} \iff {\mathbb {P}}(N < a) \geqslant 1 - \frac{\text{e}^{(a-\lambda )}\lambda ^a}{a^a}. \end{aligned}$$
(25)

A direct consequence of (23) is that one can identify N with \(\lambda\) with high probability when \(\lambda\) is large, or equivalently the other way around.

Lemma 4

(Identify N with \(\lambda\)) A point set of cardinality \(N^*\) drawn from a uniform distribution, with high probability, can be regarded as a realization of a Poisson point process with rate \(\lambda\) such that

$$\begin{aligned} {\mathbb {P}}\bigg (\frac{2N^*}{3}\leqslant \lambda \leqslant 2N^*\bigg ) \geqslant 1 - \text{e}^{-\frac{N^*}{6}} -\text{e}^{-\frac{N^*}{18}}. \end{aligned}$$

Proof

If \(N\sim {\text {Poi}}(\lambda )\), by taking \(\varepsilon = \frac{\lambda}{2}\), from (23) we have

$$\begin{aligned} {\mathbb {P}}\bigg (|N-\lambda |<\frac{\lambda }{2}\bigg ) \geqslant 1- 2\text{e}^{-\frac{\lambda }{12}} \iff {\mathbb {P}}\bigg ( \frac{\lambda }{2}< N < \frac{3\lambda }{2} \bigg ) \geqslant 1- 2\text{e}^{-\frac{\lambda }{12}}. \end{aligned}$$

Let \(\lambda _u = 2N^*\) as a potential upper bound for \(\lambda\), while \(\lambda _l = \frac{2N^*}{3}\) the potential lower bound. Then for \(N_u\sim {\text {Poi}}(\lambda _u)\) and \(N_l\sim {\text {Poi}}(\lambda _l)\):

$$\begin{aligned} {\mathbb {P}}(N_u\leqslant N^*)= & {} \;{\mathbb {P}}\bigg (N_u\leqslant \frac{\lambda }{2}\bigg ) \leqslant \text{e}^{-\frac{\lambda _u}{12}}, \\ {\mathbb {P}}(N_l\geqslant N^*)= & {}\; {\mathbb {P}}\bigg (N_l\geqslant \frac{3\lambda _l}{2}\bigg ) \leqslant \text{e}^{-\frac{\lambda _l}{12}}. \end{aligned}$$

Therefore, if we have some other Poisson processes with rate \(\lambda _1 >\lambda _u\), and \(\lambda _2<\lambda _l\) the probabilities of the corresponding Poisson variables \(N_1\sim {\text {Poi}}(\lambda _1), N_2\sim {\text {Poi}}(\lambda _2)\) to achieve at most (or at least) \(N^*\) is bounded by

$$\begin{aligned} {\mathbb {P}}(N_1\leqslant N^*)<\; & {} {\mathbb {P}}(N_u\leqslant N^*) \leqslant \text{e}^{-\frac{\lambda _u}{12}} = \text{e}^{-\frac{N^*}{6}},\\ {\mathbb {P}}(N_2 \geqslant N^*)< \;& {} {\mathbb {P}}(N_l \geqslant N^*) \leqslant \text{e}^{-\frac{\lambda _l}{12}} = \text{e}^{-\frac{N^*}{18}}. \end{aligned}$$

Note that both of the events have a probability decaying to 0 as the observation \(N^* \rightarrow \infty\), therefore we have a confidence interval of left margin \(\text{e}^{-\frac{N^*}{6}}\) and right margin \(\text{e}^{-\frac{N^*}{18}}\) to conclude that the Poisson parameter \(\lambda\) behind the observation \(N^*\) has the bound

$$\begin{aligned}\frac{2N^*}{3}\leqslant \lambda \leqslant 2N^*.\end{aligned}$$

Since the margins shrink to 0 as \(N^* \rightarrow \infty\), we can identify \(\lambda\) as cN with some constant c around 1 with high probability.

By Lemma 4, for the remaining, we will approach the proof to Theorem 3 from a Poisson process perspective and derive results with the Poisson parameter \(\lambda\).

1.2 Appendix D.2 Main Ideas of the Proof of Theorem 3

Consider a Poisson process with parameter \(\lambda\) in \(\text {supp}(\rho )\) and a corresponding point cloud \(\varGamma\) with cardinality \(N\sim {\text {Poi}}(\lambda )\). Based on previous discussion from Sect. 3.1.3, we assume the ray \(\varvec{r}(s)\) is given entirely in the interior of \(\text {supp}(\rho )\). From Theorem 2, by denoting \(1\leqslant k_i\leqslant N\) such that \(\{\varvec{x}_{k_i}\}_{i=1}^M\subset \varGamma\) are points in \(\varGamma\) sensed by the ray and \(V_{k_i}:= V(\varvec{x}_{k_i})\), equivalently the line integral error is

$$\begin{aligned} \bigg |\int _0^1 g\big (\varvec{r}(s)\big ) \textrm{d}s - \int _0^1 g\big (\varvec{x}_{k(s)}\big )\textrm{d}s \bigg | \leqslant J\sum _{i=1}^M \int _0^1 \chi \big (\{\varvec{r}(s)\in V_{k_i}\}\big )\Vert \varvec{r}(s) - \varvec{x}_{k_i} \Vert \textrm{d}s. \end{aligned}$$
(26)

To bound the above quantity, one needs to bound M the number of Voronoi cells a line goes through, the length of \(\varvec{r}(s)\) staying inside \(V_{k_i}\) and the distance to the corresponding \(\varvec{x}_{k_i}\) for each \(\varvec{r}(s)\) altogether. Our key intuition is stated as follows.

Divide the ray \(\varvec{r}(s)\) of length 1 into segments each of length h, and consider a hypercylinder of height h and radius h centering around each segment. If there is at least one point from \(\varGamma\) in each of the hypercylinders, then no point along \(\varvec{r}(s)\) will have its nearest neighbor further than distance \(H = \sqrt{2}h\) away from \(\varvec{r}(s)\). Therefore, we restrict our focus to \(\varOmega\), a tubular neighborhood of distance H around \(\varvec{r}(s)\)-that is, a “baguette-like” region with spherical end caps. \(N_{\varOmega }\), the number of points of \(\varGamma\) that are inside \(\varOmega\), will serve as an upper bound for M (the total number of unique nearest neighbors of \(\varvec{r}(s)\) in \(\varGamma\)) while the control of the other two quantities (intersecting length and distances to closest points) comes up naturally.

Undoubtedly M depends on the size of \(\varOmega\), which is controlled by h. The magnitude of h therefore becomes the crucial factor we need to determine. The following lemma motivates the choice of \(h=\lambda ^{\!-\frac{1}{d}+\varepsilon }\) for some small \(1\gg \varepsilon >0\).

Lemma 5

Under A1–A7, for a point cloud of cardinality \(N\sim {\text {Poi}}(\lambda )\) generated from a Poisson point process, and a ray \(\varvec{r}(s)\) given entirely in \(\text {supp}(\rho )\), the number of points \(N_\varOmega\) in the tubular neighborhood of radius \(H=\sqrt{2}h\) around \(\varvec{r}(s)\) will be bounded when

$$\begin{aligned}h =\lambda ^{\!-\frac{1}{d}+\varepsilon }\!\! \end{aligned}$$

for some small \(1\gg \varepsilon >0\), with probability \(\rightarrow 1\) as \(\lambda \rightarrow \infty\).

Proof

Note that the baguette region \(\varOmega\) has outer radius H, and hypercylinders of radius h are contained inside \(\varOmega\). For simplicity we prescribe h such that \(Q = \frac {1} {h}\) is an integer, then the baguette region \(\varOmega\) consists of Q number of hypercylinders, denoted by \(\{\varOmega _j\}_{j=1}^Q\) and the remaining region, denoted by \(\varOmega _r\) consisting of an annulus of outer radius H, inner radius h, and two half spheres of radius H on each side. Since each region is disjoint, according to Appendix D.1 the Poisson process with rate \(\lambda\) in \(\text {supp}(\rho )\) will have Poisson sub-process in each of the regions in a rate related to their Lesbegue measure, and all the sub-processes are independent.

Now, let \({\mathbb {P}}_Q\) denote the probability of having at least one point in each \(\varOmega _j\) in \(\{\varOmega _j\}_{j=1}^Q\) while the number of points in each \(\varOmega _j\) is also uniformly bounded by some constant \(N_Q\). Since each \(\varOmega _j\) has the same measure, their corresponding Poisson processes have the identical rate \(\lambda _q = |\varOmega _1 |\lambda\). Let \(N_j\) denote the Poisson random variable for \(\varOmega _j.\) Then,

$$\begin{aligned}{\mathbb {P}}(N_j\geqslant 1) = 1 - {\mathbb {P}}(N_j = 0) = 1 - \text{e}^{-\lambda _q}.\end{aligned}$$

Combined with (24) by requiring \(N_Q>\lambda _q\), this implies

$$\begin{aligned} {\mathbb {P}}(N_Q > N_j\geqslant 1) = {\mathbb {P}}(N_j\geqslant 1) - {\mathbb {P}}(N_j \geqslant N_Q) \geqslant 1 - \text{e}^{-\lambda _q} - \frac{\text{e}^{(N_Q-\lambda _q)}\lambda _q^{N_Q}}{N_Q^{N_Q}} ,\end{aligned}$$

and hence

$$\begin{aligned} {\mathbb {P}}_Q \geqslant \left( 1 - \text{e}^{-\lambda _q}-\frac{\text{e}^{(N_Q-\lambda _q)}\lambda _q^{N_Q}}{N_Q^{N_Q}} \right) ^Q \geqslant 1 - Q\left( \text{e}^{-\lambda _q} + \frac{\text{e}^{(N_Q-\lambda _q)}\lambda _q^{N_Q}}{N_Q^{N_Q}} \right) . \end{aligned}$$
(27)

The measure of the remaining region \(\varOmega _r\) is \(|\varOmega _r | = \omega _{d}H^{d} + \omega _{d-1}(H^{d-1}-h^{d-1})\), where \(\omega _d\) is the volume of the unit d-sphere. Therefore the Poisson process on \(\varOmega _r\) has rate \(\lambda _r = |\varOmega _r |\lambda\). Let \(N_r\) denote the corresponding Poisson random variable, again by (24) with \(N'>\lambda _r\):

$$\begin{aligned} {\mathbb {P}}(N_r < N') \geqslant 1 -\frac{\text{e}^{(N'-\lambda _r)}\lambda _r^{N'}}{N'^{N'}}. \end{aligned}$$
(28)

Since \(\varOmega _r\) and \(\bigcup \{\varOmega _j\}_{j=1}^Q\) are disjoint, by independence, the combined probability \(p_\textrm{tot}\) that all these events happen:

  1. (i)

    the number of points \(N_j\) in each hypercylinder \(\varOmega _j\) is at least 1,

  2. (ii)

    \(N_j\) is uniformly bounded above by some constant \(N_Q,\)

  3. (iii)

    the number of points \(N_r\) in the remaining regions \(\varOmega _r = \varOmega -\cup _j \{\varOmega _j\}\) is also bounded above by some constant \(N',\)

would have the lower bound:

$$\begin{aligned} p_\textrm{tot}&\geqslant \bigg ( 1 -\frac{\text{e}^{(N'-\lambda _r)}\lambda _r^{N'}}{N'^{N'}}\bigg ) \left ( 1 - Q \left(\text{e}^{-\lambda _q} + \frac{\text{e}^{(N_Q-\lambda _q)}\lambda _q^{N_Q}}{N_Q^{N_Q}} \right)\right ) \\ {}&\geqslant 1 - \frac{\text{e}^{(N'-\lambda _r)}\lambda _r^{N'}}{N'^{N'}} - Q\left(\text{e}^{-\lambda _q} + \frac{\text{e}^{(N_Q-\lambda _q)}\lambda _q^{N_Q}}{N_Q^{N_Q}} \right). \end{aligned}$$

Then with probability \(p_\textrm{tot}\), we have an upper bound for \(N_{\varOmega }\), the total number of points in \(\varOmega\):

$$\begin{aligned} N_{\varOmega } \leqslant N' + QN_Q. \end{aligned}$$
(29)

Apparently \(N_{\varOmega }\) and \(p_\textrm{tot}\) are inter-dependent: as we restrict the R.H.S. bound in (29) by choosing a smaller \(N'\) or \(N_Q\), the bound for \(p_\textrm{tot}\) will be loosened. From Lemma 4, we set \(N' = \alpha \lambda _r\), \(N_Q = \beta \lambda _q\) for some \(\alpha ,\beta >1\). Therefore, the next step is to determine the parameter set \(\{h,\,\alpha ,\,\beta \}\) to give a more balanced bound to the R.H.S. in (29) while still ensuring the probability of undesired events will have exponential decay.

For that purpose we need some optimization. We know

$$\begin{aligned}\lambda _r = |\varOmega _r |\lambda =(\omega _{d}H^{d} +\omega _{d-1}(H^{d-1}-h^{d-1}))\lambda = \left(\omega _{d}2^{\frac{d}{2}}h^{d} + \omega _{d-1}\left(2^{\frac{d-1}{2}}-1\right)h^{d-1}\right)\lambda ; \end{aligned}$$
$$\begin{aligned} \lambda _q = |\varOmega _1 |\lambda = (\omega _{d-1}h^{d-1}h)\lambda = \omega _{d-1} h^d\lambda . \end{aligned}$$
(30)

We need to investigate how h should scale with \(\lambda\), so we assume \(h\sim \lambda ^{-p}\) for some constant p to be determined. The following optimization procedure provides some motivations for choosing p. On the one hand, for the constraints we need to ensure that the probability of each of the three events above not occurring decays to 0 as \(\lambda \rightarrow \infty\)

$$\begin{aligned} \frac{\text{e}^{(N'-\lambda _r)}\lambda _r^{N'}}{N'^{N'}} \rightarrow 0&\iff (N'-\lambda _r) + N'\log (\lambda _r) - N'\log (N') \rightarrow -\infty , \\ Q\text{e}^{-\lambda _q}\rightarrow 0&\iff \log (Q) - \lambda _q \rightarrow -\infty , \\ Q\frac{\text{e}^{(N_Q-\lambda _q)}\lambda _q^{N_Q}}{N_Q^{N_Q}} \rightarrow 0&\iff \log (Q) + (N_Q-\lambda _q) + N_Q\log (\lambda _q) - N_Q \log (N_Q) \rightarrow -\infty , \end{aligned}$$

and representing all the quantities in terms of \(\lambda ,\, p,\, \alpha ,\,\beta\) and simplifying

$$\begin{aligned} (\alpha -1)\lambda _r - \alpha \lambda _r\log (\alpha ) \rightarrow -\infty \quad&\implies \alpha \big (1-\log (\alpha )\big )<1&\implies \alpha>1 ,\\ p\log (\lambda ) - \omega _{d-1}\lambda ^{-pd+1} \rightarrow -\infty \quad&\implies \beta \big (1-\log (\beta )\big )<1&\implies \beta>1, \\ \log (Q) + (\beta -1)\lambda _q - \beta \lambda _q\log (\beta ) \rightarrow -\infty \quad&\implies -pd + 1 > 0&\implies p < \frac{1}{d}. \end{aligned}$$

On the other hand, for the objective, note that

$$\begin{aligned} N' + \frac{N_Q}{h}&\leqslant 2\max \left(N',\, \frac{N_Q}{h}\right) \\ {}&= 2\max \bigg (\alpha \left (\left(\omega _{d}2^{\frac{d}{2}}h^{d} + \omega _{d-1} \left(2^{\frac{d-1}{2}}-1\right)h^{d-1}\right)\lambda \right), \;\frac{\beta }{h}\omega _{d-1} h^d\lambda \bigg )\\&\leqslant 3\max \bigg (\alpha \omega _{d}2^{\frac{d}{2}}h^{d}\lambda ,\; \alpha \omega _{d-1}\left (2^{\frac{d-1}{2}}-1\right)h^{d-1}\lambda ,\; \frac{\beta }{h}\omega _{d-1} h^d\lambda \bigg ), \end{aligned}$$

since \(\alpha ,\,\beta\) are just some constants \(>1\), fixing \(\alpha\) and \(\beta\) so that \(h = \lambda ^{-p}\) and we want to minimize h to obtain an upper bound for the total number of points in \(\varOmega\):

$$\begin{aligned}&\qquad\quad{\mathop{\text{arg\,min\,max}}\limits_{h}} \bigg (\alpha \omega _{d}2^{\frac{d}{2}}h^{d}\lambda ,\, \alpha \omega _{d-1}\left(2^{\frac{d-1}{2}}-1 \right)h^{d-1}\lambda ,\, \frac{\beta }{h}\omega _{d-1} h^d\lambda \bigg ) \\& \iff {\mathop{\text{arg\,min\,max}}\limits_{p}} \bigg (c_2+(1-pd+p)\log (\lambda ) , \, c_3+(1-pd+p)\log (\lambda )\bigg ) \\& \iff {\mathop{\text{arg\,min\,max}}\limits_{p}} \bigg ((1-pd+p)\log (\lambda )\bigg ). \end{aligned}$$

Combined with bounds derived from the constraints, to minimize \((1-(d-1)p)\), we need to maximize p, therefore we take \(p = \frac{1}{d}-\varepsilon\) for an infinitesimal \(\varepsilon >0\).

1.3 Appendix D.3 Proof of Theorem 3

Proof of Theorem 3

Consider a Poisson process with rate \(\lambda\) on the \(\text {supp}(\rho )\). As in Appendix D.2, let \(Q = \frac {1} {h}\) be an integer for simplicity (or take ceiling if desired), and consider Q hypercylinders of radius h centered along \(\varvec{r}(s)\). Again as in Appendix D.2, let \(\varOmega\) be the tubular neighborhood of distance \(H = \sqrt{2}h\) around \(\varvec{r}(s)\). Motivated by Appendix D.2, we set

$$\begin{aligned} h =\lambda ^{\!-\frac{1}{d}+\varepsilon } \end{aligned}$$

for some small constants \(1\gg \varepsilon >0\) to be determined.

Divide the tubular neighborhood \(\varOmega\) into two parts, one consists of the set of hypercylinders \(\bigcup _{j=1}^Q \varOmega _j\) around \(\varvec{r}(s)\), the other is the remainder \(\varOmega _r\). From the setting of Lemma 5, let \(N' = \alpha \lambda _r\) be the number of points in \(\varOmega _r\) while \(N_Q = \beta \lambda _q\) is for \(\varOmega _j\), and we set \(\alpha =\beta = \text {e} > 1\) (also satisfying the constraints in Lemma 5) to simplify the calculations so that we have \(N_Q = \text{e}\lambda _q, N_r = \text {e}\lambda _r\), and equation (27) becomes

$$\begin{aligned} {\mathbb {P}}(\text{e}\lambda _q> N_j\geqslant 1) = {\mathbb {P}}(N_j\geqslant 1) - {\mathbb {P}}(N_j \geqslant \text{e}\lambda _q) \geqslant 1 - 2\text{e}^{-\lambda _q} ,\end{aligned}$$
$$\begin{aligned} \implies {\mathbb {P}}_Q \geqslant \bigg (1 - 2\text{e}^{-\lambda _q}\bigg )^Q \geqslant 1 - 2Q\text{e}^{-\lambda _q}. \end{aligned}$$
(31)

So the total number in \(\bigcup _{j=1}^Q \varOmega _j\) is bounded by \(Q \text{e}\lambda _q\) while there is still at least one point in every \(\varOmega _j\) with the above probability. On the other hand for (28),

$$\begin{aligned} {\mathbb {P}}(N_r < \text{e}\lambda _r) \geqslant 1 - \text{e}^{-\lambda _r}.\end{aligned}$$

Again by the same independence argument, the total probability that all the events happen has the following lower bound:

$$\begin{aligned}p_\textrm{tot} \geqslant (1 - \text{e}^{-\lambda _r}) ( 1 - 2Q\text{e}^{-\lambda _q}) \geqslant 1 - 2Q\text{e}^{-\lambda _q} - \text{e}^{-\lambda _r}.\end{aligned}$$

And the total number of points inside \(\varOmega\) is bounded by

$$\begin{aligned} N_{\varOmega }\leqslant \text{e}\lambda _r + Qe\lambda _q = \text {e}(\lambda _r + Q\lambda _q) = \text {e}\lambda _{\varOmega }.\end{aligned}$$

Finally, when there is at least one point in each of \(\varOmega _j\), the maximum distance from any point on \(\varvec{r}(s)\) to its nearest neighbor is given by \(H=\sqrt{2}h\) as we have argued. Furthermore, under this setting, for any of the potential nearest neighbors, the maximum length that \(\varvec{r}(s)\) intersect its Voronoi cell has an upper bound of 3h. Therefore, the line integral error (26) is bounded by

$$\begin{aligned} &\quad \bigg |\int _0^1 g\big (\varvec{r}(s)\big ) \textrm{d}s - \int _0^1 g\big (\varvec{x}_{k(s)}\big )\textrm{d}s \bigg | \leqslant \frac{J}{2} \sum _{i=1}^M H\times 3h \leqslant \frac{J}{2} N_{\varOmega }\times H\times 3h = 3\sqrt{2}h^2 \text{e}\lambda _{\varOmega } \\ & \leqslant \frac{3\text{e}\sqrt{2}J}{2} h^2 \left(\omega _d2^{\frac{d}{2}}h^d + \omega _{d-1} 2^{\frac{d-1}{2}}h^{d-1}\right)\lambda \leqslant c(d,J) (h^{d+2}+h^{d+1})\lambda \leqslant c(d,J) \lambda ^{-\frac{1}{d}+\varepsilon (d+1)}. \end{aligned}$$

Finally, for the total probability \(p_\textrm{tot}\):

$$\begin{aligned}p_\textrm{tot} \geqslant 1 - 2Q\text{e}^{-\lambda _q} - \text{e}^{-\lambda _r}=1- \frac{2}{h}\text{e}^{-\omega _{d-1}h^d\lambda } - \text{e}^{-|\varOmega _r |\lambda },\end{aligned}$$

and recall from (30): \(\lambda _q = |\varOmega _1 |\lambda = \omega _{d-1}h^{d-1}h\lambda = \omega _{d-1}h^d\lambda = \omega _{d-1}\lambda ^{\varepsilon d}\). Then

$$\begin{aligned}2Q\text{e}^{-\lambda _q} = 2(\lambda )^{\frac{1}{d}-\varepsilon }\text{e}^{-\omega _{d-1}\lambda ^{\varepsilon d}} \rightarrow 0 \;\text { as } \lambda \rightarrow \infty .\end{aligned}$$

The above convergence can be shown by taking the natural log:

$$\begin{aligned} \ln \bigg (2(\lambda )^{\frac{1}{d}-\varepsilon }\text{e}^{-\omega _{d-1}\lambda ^{\varepsilon }}\bigg ) = \ln 2 + \bigg (\frac{1}{d}-\varepsilon \bigg )\ln \lambda -\omega _{d-1}\lambda ^{\varepsilon }\rightarrow -\infty \;\text { as } \lambda \rightarrow \infty , \end{aligned}$$

since \(\ln {\lambda }\) grows slower than \(\lambda ^{\varepsilon }\) for any \(\varepsilon >0\). As for the last term \(\text{e}^{-|\varOmega _r |\lambda }\):

$$\begin{aligned} \text{e}^{-|\varOmega _r |\lambda } \leqslant \text{e}^{-\omega _{d-1}(2^{\frac{d-1}{2}}-1)h^{d-1}\lambda }\leqslant \text{e}^{-c\lambda ^{\frac{1}{d}+\varepsilon (d-1)}} \leqslant \text{e}^{-c(d)\lambda ^{\frac{1}{d}} } \rightarrow 0 \;\text { as } \lambda \rightarrow \infty . \end{aligned}$$

Thus, the probability \(p_\textrm{tot}\rightarrow 1\) as \(\lambda \rightarrow \infty\), and we have our line integral error \(\leqslant c(d,J) \lambda ^{-\frac{1}{d}+\varepsilon (d+1)} \rightarrow 0\) as long as \(\varepsilon <\frac{1}{(d+1)^2}\). To obtain the convergence in terms of the actual number of points N in the point cloud, we invoke Lemma 4 and set \(N=c\lambda\) to conclude the proof.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, L., Ly, L., Macdonald, C.B. et al. Nearest Neighbor Sampling of Point Sets Using Rays. Commun. Appl. Math. Comput. (2023). https://doi.org/10.1007/s42967-023-00318-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42967-023-00318-1

Keywords

Mathematics Subject Classification

Navigation