Abstract
We propose a new framework for the sampling, compression, and analysis of distributions of point sets and other geometric objects embedded in Euclidean spaces. Our approach involves constructing a tensor called the RaySense sketch, which captures nearest neighbors from the underlying geometry of points along a set of rays. We explore various operations that can be performed on the RaySense sketch, leading to different properties and potential applications. Statistical information about the data set can be extracted from the sketch, independent of the ray set. Line integrals on point sets can be efficiently computed using the sketch. We also present several examples illustrating applications of the proposed strategy in practical scenarios.
Similar content being viewed by others
Data Availability
All the datasets used in this paper are well-known public datasets, and they are available through a simple search.
Notes
RaySense does not require point clouds for inputs: we could apply RaySense directly to surface meshes, implicit surfaces, or even—given an fast nearest neighbor calculator—the CAD models directly.
References
Atzmon, M., Maron, H., Lipman, Y.: Point convolutional neural networks by extension operators. arXiv:1803.10091 (2018)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Besl, P.J., McKay, N.D.: Method for registration of 3-D shapes. In: Sensor Fusion IV: Control Paradigms and Data Structures, vol. 1611 (1992)
Caflisch, R.E.: Monte Carlo and quasi-Monte Carlo methods. Acta Numer. 7, 1–49 (1998)
Chang, A.X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J.X., Yi, L., Yu, F.: ShapNet: an information-rich 3D model repository. arXiv:1512.03012 (2015)
Corless, R.M., Gonnet, G.H., Hare, D.E., Jeffrey, D.J., Knuth, D.E.: On the LambertW function. Adv. Comput. Math. 5(1), 329–359 (1996)
Curless, B., Levoy, M.: A volumetric method for building complex models from range images computer graphics. In: SIGGRAPH 1996 Proceedings (1996)
Cutler, A., Breiman, L.: Archetypal analysis. Technometrics 36(4), 338–347 (1994)
Daley, D.J., Vere-Jones, D.: An Introduction to the Theory of Point Processes. Volume I: Elementary Theory and Methods. Springer, New York (2003)
De Bruijn, N.G.: Asymptotic Methods in Analysis, vol. 4. Courier Corporation, USA (1981)
Doerr, B.: Probabilistic tools for the analysis of randomized optimization heuristics. In: Theory of Evolutionary Computation, pp. 1–87. Springer, Cham, Switzerland (2020)
Draug, C., Gimpel, H., Kalma, A.: The Octave Image package (version 2.14.0) (2022). https://gnu-octave.github.io/packages/image
Dvoretzky, A., Kiefer, J., Wolfowitz, J.: Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. Ann. Math. Stat. 27(3), 642–669 (1956)
Eldar, Y., Lindenbaum, M., Porat, M., Zeevi, Y.Y.: The farthest point strategy for progressive image sampling. IEEE Trans. Image Process. 6(9), 1305–1315 (1997)
Fang, Y., Xie, J., Dai, G., Wang, M., Zhu, F., Xu, T., Wong, E.: 3D deep shape descriptor. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2319–2328 (2015)
Graham, R., Oberman, A.M.: Approximate convex hulls: sketching the convex hull using curvature. arXiv:1703.01350 (2017)
Hadwiger, H.: Vorlesungen Über Inhalt, Oberfläche und Isoperimetrie, vol. 93. Springer, Berlin (1957)
Helgason, S., Helgason, S.: The Radon Transform, vol. 2. Springer, New York (1980)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pp. 604–613 (1998)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167 (2015)
Jiménez, P., Thomas, F., Torras, C.: 3D collision detection: a survey. Comput. Gr. 25(2), 269–285 (2001)
Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. IEEE Trans. Big Data 7(3), 535–547 (2019)
Jones, P.W., Osipov, A., Rokhlin, V.: A randomized approximate nearest neighbors algorithm. Applied and Computational Harmonic Analysis 34(3), 415–444 (2013)
Kazmi, I.K., You, L., Zhang, J.J.: A survey of 2D and 3D shape descriptors. In: 2013 10th International Conference Computer Graphics, Imaging and Visualization, pp. 1–10. IEEE (2013)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980 (2014)
Klain, D.A., Rota, G.-C.: Introduction to Geometric Probability. Cambridge University Press, Cambridge (1997)
Klokov, R., Lempitsky, V.: Escape from cells: deep KD-networks for the recognition of 3D point cloud models. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 863–872 (2017)
Krig, S.: Interest point detector and feature descriptor survey. In: Computer Vision Metrics, pp. 187–246. Springer, Cham (2016)
LeCun, Y.: The MNIST database of handwritten digits (1998). http://yann.lecun.com/exdb/mnist/
Li, J.X., Chen, B.M., Lee, H.: SO-Net: self-organizing network for point cloud analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9397–9406 (2018)
Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: PointCNN: convolution on X-transformed points. In: Advances in Neural Information Processing Systems, pp. 820–830 (2018)
Lin, M., Gottschalk, S.: Collision detection between geometric models: a survey. Proc. IMA Conf. Math. Surf. 1, 602–608 (1998)
Macdonald, C.B., Merriman, B., Ruuth, S.J.: Simple computation of reaction-diffusion processes on point clouds. Proc. Natl. Acad. Sci. 110(23), 9209–9214 (2013)
Macdonald, C.B., Miller, M., Vong, A., et al.: The Octave Symbolic package (version 3.0.1) (2022). https://gnu-octave.github.io/packages/symbolic
Mahalanobis, P.C.: On the generalised distance in statistics. Proc. Natl. Inst. Sci. India 2, 49–55 (1936)
Massart, P.: The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. Ann. Prob. 1990, 1269–1283 (1990)
Meurer, A., Smith, C.P., Paprocki, M., Čertík, O., Kirpichev, S.B., Rocklin, M., Kumar, A., Ivanov, S., Moore, J.K., Singh, S., Rathnayake, T., Vig, S., Granger, B.E., Muller, R.P., Bonazzi, F., Gupta, H., Vats, S., Johansson, F., Pedregosa, F., Curry, M.J., Terrel, A.R., Roučka, Š., Saboo, A., Fernando, I., Kulal, S., Cimrman, R., Scopatz, A.: SymPy: symbolic computing in Python. PeerJ Comput. Sci. 3, e103 (2017). https://doi.org/10.7717/peerj-cs.103
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)
Natterer, F.: The Mathematics of Computerized Tomography. Society for Industrial and Applied Mathematics, USA (2001)
Osting, B., Wang, D., Xu, Y., Zosso, D.: Consistency of archetypal analysis. SIAM J. Math. Data Sci. 3(1), 1–30 (2021).
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in PyTorch. In: NIPS Autodiff Workshop (2017)
Peyré, G., Cuturi, M., et al.: Computational optimal transport: with applications to data science. Found. Trends® Mach. Learn. 11(5–6), 355–607 (2019)
Pollard, D.: Convergence of Stochastic Processes. Springer, New York (1984)
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)
Radon, J.: über die bestimmung von funktionen durch ihre integralwerte längs gewisser mannigfaltigkeiten. Class. Pap. Mod. Diagn. Radiol. 5(21), 124 (2005)
Rostami, R., Bashiri, F.S., Rostami, B., Yu, Z.: A survey on data-driven 3D shape descriptors. Comput. Graph. Forum 38(1), 356–393 (2019)
Ruuth, S.J., Merriman, B.: A simple embedding method for solving partial differential equations on surfaces. J. Comput. Phys. 227(3), 1943–1961 (2008)
Santaló, L.A.: Integral Geometry and Geometric Probability. Cambridge University Press, New York (2004)
Sedaghat, N., Zolfaghari, M., Amiri, E., Brox, T.: Orientation-boosted voxel nets for 3D object recognition. arXiv:1604.03351 (2016)
Settles, B.: Active learning literature survey. Technical report, University of Wisconsin-Madison Department of Computer Sciences (2009)
Shen, Y., Feng, C., Yang, Y., Tian, D.: Mining point cloud local structures by kernel correlation and graph pooling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Simonovsky, M., Komodakis, N.: Dynamic edge-conditioned filters in convolutional neural networks on graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3693–3702 (2017)
Sinha, A., Bai, J., Ramani, K.: Deep learning 3D shape surfaces using geometry images. In: European Conference on Computer Vision. Springer, Berlin (2016)
Solmon, D.C.: The X-ray transform. J. Math. Anal. Appl. 56(1), 61–83 (1976)
The mpmath development team: mpmath: a Python library for arbitrary-precision floating-point arithmetic (version 1.2.1). (2021). https://mpmath.org/
Trefethen, L.N., Weideman, J.A.C.: The exponentially convergent trapezoidal rule. SIAM Rev. 56(3), 385–458 (2014)
Tsai, Y.-H.R.: Rapid and accurate computation of the distance function using grids. J. Comput. Phys. 178(1), 175–195 (2002)
Villani, C.: Optimal Transport: Old and New, vol. 338. Springer, Berlin (2009)
Wang, P.-S., Liu, Y., Guo, Y.-X., Sun, C.-Y., Tong, X.: O-CNN: Octree-based convolutional neural networks for 3D shape analysis. ACM Trans. Gr. (TOG) 36(4), 72 (2017)
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Gr. (TOG) 38(5), 146 (2019)
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D shapenets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1912–1920 (2015)
Xia, F., et al.: PointNet.pytorch Git repository. https://github.com/fxia22/pointnet.pytorch
Xie, J., Dai, G., Zhu, F., Wong, E.K., Fang, Y.: Deepshape: deep-learned shape descriptor for 3D shape retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1335–1345 (2016)
Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4490–4499 (2018)
Acknowledgements
Part of this research was performed while Macdonald and Tsai were visiting the Institute for Pure and Applied Mathematics (IPAM), which is supported by the National Science Foundation (Grant No. DMS-1440415). This work was partially supported by a grant from the Simons Foundation, NSF Grants DMS-1720171 and DMS-2110895, and a Discovery Grant from Natural Sciences and Engineering Research Council of Canada. The authors thank the Texas Advanced Computing Center (TACC) and UBC Math Dept Cloud Computing for providing computing resources.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors have no competing interests to declare that are relevant to the content of this article.
Appendices
Appendix A Examples of Ray Distributions
We assume all points are properly calibrated by a common preprocessing step. This could also be learned. In fact, one can use RaySense to train such a preprocessor to register the dataset, for example, using Sect. 4.4 or similar. However, for simplicity, in our experiments, we generally normalize each point set to be in the unit \(\ell ^2\) ball, with a center of mass at the origin, unless otherwise indicated.
We present two ways to generate random rays. There is no right way to generate rays, although it is conceivable that one may find optimal ray distributions for specific applications.
Method R1 One simple approach is generating rays of the fixed-length L, whose direction \(\varvec{v}\) is uniformly sampled from the unit sphere. We add a shift \(\varvec{b}\) sampled uniformly from \([-\frac{1}{2}, \frac{1}{2}]^d\) to avoid a bias for the origin. The \(n_r\) sample points are distributed evenly along the ray:
The spacing between adjacent points on each ray is denoted by \(\delta r\), which is \(\frac{L}{n_r-1}\). We use \(L=2\).
Method R2 Another natural way to generate random rays is by random endpoints selection: choose two random points \(\varvec{p}\), \(\varvec{q}\) on a sphere and connect them to form a ray. Then we evenly sample \(n_r\) points between \(\varvec{p}\), \(\varvec{q}\) on the ray. To avoid overly short rays where information would be redundant, we use a minimum ray-length threshold \(\tau\) to discard rays. Note that the distance between \(n_r\) sample points is different on different rays:
The spacing of points on each ray varies, depending on the length of the ray.
Figure 13 shows the density of rays from the ray generation methods. In this paper, we use Method R1; a fixed \(\delta r\) seems to help maintain spatial consistency along the rays, which increases RayNN’s classification accuracy in Sect. 4.5.
Appendix B Implementation Details of RayNN
Our implementation uses PyTorch [41].
Architecture RayNN takes the \(m\times k \times c\) RaySense sketch tensor \({\mathcal {S}}(\varGamma )\) as input, and outputs a K-vector of probabilities, where K is the number of object classes.
The first few layers of the network are blocks of 1D convolution followed by max-pooling to encode the sketch into a single vector per ray. Convolution and max-pooling are applied along the ray. After this downsizing, we implemented a max operation across rays. Figure 14 includes some details. The output of the max pooling layer is fed into fully connected layers with output sizes 256, 64, and K to produce the desired vector of probabilities \(\varvec{{\textbf{p}}}_i \in {\mathbb {R}}^K\). Batchnorm [20] along with ReLU [38] is used for every fully-connected and convolution layer.
Note that our network uses convolution along rays to capture local information while the fully connected layers aggregate global information. Between the two, the max operation across rays ensures invariance to the ordering of the rays. It also allows for an arbitrary number of rays to be used during inference. These invariance properties are similar to PointNet’s input-order invariance [44].
Data We apply RayNN on the standard ModelNet10 and ModelNet40 benchmarks [62] for 3D object classification. ModelNet40 consists of 12 311 orientation-aligned [50] meshed 3D CAD models, divided into 9 843 training and 2 468 test objects. ModelNet10 contains 3 991 training and 908 test objects. Following the experiment setup in [44], we sample \(N=1 \;024\) points from each of these models and rescale them to be bounded by the unit sphere to form point sets.Footnote 1 Our results do not appear to be sensitive to N.
Training During training, we use dropout with ratio 0.5 on the penultimate fully-connected layer. We also augment our training dataset on-the-fly by adding \({\mathcal {N}}(0,0.000\; 4)\) noise to the coordinates. For the optimizer, we use Adam [25] with momentum 0.9 and batch size 16. The learning rate starts at 0.002 and is halved every 100 epochs.
Inference Our algorithm uses random rays, so it is natural to consider strategies to reduce the variance in the prediction. We consider one simple approach during inference by making an ensemble of predictions from \(\lambda\) different ray sets. The ensemble prediction is based on the average over the \(\lambda\) different probability vectors \({{\textbf{p}}}_i \in {\mathbb {R}}^K\), i.e.,
The assigned label then corresponds to the entry with the largest probability. We denote the number of rays used during training by m, while the number of rays used for inference is \({\hat{m}}\). Unless otherwise specified, we use \(\lambda =8\), \(m=32\) rays, and \({\hat{m}}=m\).
Appendix C Details of the Proof of Theorem 1
This appendix contains the proofs of Lemmas 1, 2, and 3, and Theorem 1.
Proof of Lemma 1
The probability measure of \(\varOmega _j\) is
which represents the probability of sampling \(\varOmega _j\) when drawing i.i.d. random samples from \(\mu\). For a fixed set of such hypercubes, any \(\varvec{x}\in \text {supp}(\rho )\) will fall in one of the \(\varOmega _j\)’s. Then one can define a mapping \(h{:}\text {supp}(\rho )\subset {\mathbb {R}}^d\rightarrow {\mathbb {R}}\) by
By applying the mapping to the random vector \(\varvec{X}\), we obtain a new discrete random variable S with the discrete probability distribution \(\mu _M\) on \({\mathbb {R}}\) and the corresponding density \(\rho _M\). The random variable S lives in a discrete space \(S \in \{0,\,1,\,\cdots ,\,M-1\}\) and \(\rho _M\) is given as a sum of delta spikes as
As a result, sampling from the distribution \(\mu _M\) is equivalent to sampling the hypercubes according to the distribution \(\mu\) in \({\mathbb {R}}^d\), but one cares only about the sample being in a specific hypercube \(\varOmega _j\), not the precise location of the sample. Let \(F_M(s)\) denote the cumulative density function related to the density function \(\rho _M(s)\).
Now, given a set of N independent samples of \(\varvec{X}{:}\) \(\{\varvec{X}_i\}_{i=1}^N\subset {\mathbb {R}}^d\), we have a corresponding set of N independent sample points of S: \(\{s_i\}_{i=1}^N\) such that \(\varvec{X}_i\in \varOmega _{s_i+1}\). From there, we can regard the histogram of \(\{s_i\}_{i=1}^N\) as an empirical density of the true density \(\rho _M\). Denote the empirical density by \({\tilde{\rho }}_M^N\) which is given by
One can therefore also obtain an empirical cumulative density function \({\tilde{F}}^N_M(s)\) using the indicator function \(\chi\):
By Dvoretzky-Kiefer-Wolfowitz inequality [13, 36] we have
Therefore, for a desired fixed probability \(p_0\), the above indicates the approximating error given by the empirical \({\tilde{F}}^N_M(s)\) is at most
with probability at least \(p_0\). Then note that the true probability measure \(P_{\varOmega _j}\) of \(\varOmega _j\) being sampled by random drawings from \(\mu\) is equivalent to the true probability of \(j-1\) being drawn from \(\mu _M\), i.e.,
therefore, \(P_{\varOmega _j} = P_M(j-1)\) can be computed from \(F_M\) by
Taking absolute value and using the triangle inequality, with the fixed \(p_0\)
where \({\tilde{P}}^N_M(j-1)\) denotes the empirical probability at \(j-1\). Applying the same argument to \({\tilde{P}}^N_M(j-1),\) one has
For a set of N sample points, \({\tilde{P}}^N_M(j-1)\) is computed by \( \frac{N_j}{N}\), where \(N_j\) is the number of times \(j-1\) got sampled by \(\{s_i\}_{i=1}^N\), or equivalently \(\varOmega _j\) got sampled by \(\{\varvec{X}_i\}_{i=1}^N\), which indicates that in practice, with probability at least \(p_0\), the number of sampling points \(N_j\) in \(\varOmega _j\) satisfies the following bound:
By taking N large enough such that \(P_{\varOmega _j}N -2\varepsilon _N N = 1 \implies N_j \geqslant 1\):
The above quantity is clearly a function with respect to the probability measure \(P_{\varOmega _j}\), and any \(\varOmega _i\) with \(P_{\varOmega _i}\geqslant P_{\varOmega _j}\) would have \(N_i\geqslant N_j\geqslant 1\). Using \(\nu\) to denote such a function and \(0<P\leqslant 1\) as the threshold measure completes the first part of the proof:
To establish the bounds on the expression, we note
Proof of Lemma 2
Consider a local hypercube centered at \(\varvec{y}\), \(\varOmega _{\varvec{y}}:= \{\varvec{x}+\varvec{y}\in {\mathbb {R}}^d{:} \Vert \varvec{x}\Vert _\infty =\frac{l}{2}\}\) of length l to be determined. We shall just say “cube”. The probability of cube \(\varOmega _{\varvec{y}}\) being sampled is given by \(P_{\varOmega _{\varvec{y}}} = \int _{\varOmega _{\varvec{y}}} \rho (\varvec{x})\textrm{d}\varvec{x}\). Now for the set of standard basis vector \(\{\varvec{e}_i\}_{i=1}^d\), let \(\varvec{v}_d\) denote the sum of all the basis: \(\varvec{v}_d:=\sum _{i=1}^d \varvec{e}_i\). Without loss of generality, the probability of a diagonal cube, defined by \(\varOmega _{\varvec{y}_d}:=\{\varvec{x}+\varvec{y}+\varvec{v}_d \in {\mathbb {R}}^d{:} \Vert \varvec{x}\Vert _\infty =\frac{l}{2}\}\), being sampled (unconditional to \(\varOmega _{\varvec{y}}\) being sampled) has the following bound by Lipschitz continuity of \(\rho\):
Furthermore, \(P_{\varOmega _{\varvec{y}}}\) has the following lower bound also by Lipschitz continuity of \(\rho\). For any \(x \in \varOmega _{\varvec{y}}\), we have
Combining with the previous bound for \(P_{\varOmega _{\varvec{y}_d}}\), we further have
By setting \(\rho (\varvec{y}) > \frac{3\sqrt{d}}{2}Ll\) we can ensure \(P_{\varOmega _{\varvec{y}_d}} >0\), but this extreme lower bound is based on on Lipschitz continuity. To obtain a more useful bound, we will show below that by picking \(l:= l_N\) judiciously, \(\rho (\varvec{y})> 3\sqrt{d}Ll_N>0\), any surrounding cube has non-zero probability to be sampled. Therefore, with \(\rho (\varvec{y}) > 3\sqrt{d}Ll_N\), for any diagonal cube \(\varOmega _{y_d}\):
Since the diagonal cube is the furthest to \(\varvec{y}\) among all the surrounding cubes, we have for every surrounding cube of \(\varOmega _{\varvec{y}}\), their probability measure is at least \(P_{\varOmega _{\varvec{y}_d}}.\)
According to Lemma 1, for N sampling points, with probability at least \(p_0\), if a region has probability measure \(\geqslant P_N\), then there is at least one point sampled in that region, where \(P_N\) is the threshold probability depending on N obtained by solving the equation below:
By the bounds for N in (17) of Lemma 1, we know there is some constant \(c\in (1,3)\) s.t.:
Solving the above quadratic equation and realize that \(P_N>0\), we have
Therefore, for a fixed N, by requiring
we have with probability \(p_0\) that at every surrounding cube of \(\varOmega _{\varvec{y}}\) of side length \(l_N\), there is at least one point. This lower bound for \(l_N\) ensures the surrounding cube has enough probability measure to be sampled. Since \(1<c<3\), we can just take \(l_N\) to be
From above we see that for a fixed \(\rho (\varvec{y})\), \(l_N\) decreases as N increases. Therefore, by choosing N large enough, we can always satisfy the prescribed assumption \(\rho (\varvec{y})\geqslant 3\sqrt{d}Ll_N\).
Furthermore, when N is so large such that \(\rho (\varvec{y})\geqslant 3\sqrt{d}Ll_N\) is always satisfied, we see that \(l_N\) is a decreasing function of \(\rho\), meaning that with a higher local density \(\rho (\varvec{y})\), the \(l_N\) can be taken smaller while the sampling statement still holds, meaning the local region is more compact.
Finally, since there is a point in every surrounding cube of \(\varOmega _{\varvec{y}}\), the diameter of the Voronoi cell of \(\varvec{y}\) has the following upper-bound with the desired probability \(p_0\):
Now, for a sample point \(x_0\) in the interior of \(\text {supp}(\rho )\), given a cover of cubes as in Lemma 1, \(x_0\) must belong to one of the cubes with center also denoted by \(\varvec{y}\) with a slight abuse of notation. Then note that the diameter of \(V(\varvec{x}_0)\) also has the same upper bound as shown above. To go from \(\rho (\varvec{y})\) to \(\rho (\varvec{x}_0)\), by Lipschitz continuity: \(\rho (\varvec{y})\geqslant \rho (\varvec{x}_0) - \frac{L\sqrt{d}}{2}l_N \implies \rho (\varvec{x}_0) \leqslant \rho (\varvec{y})+\frac{L\sqrt{d}}{2}l_N\). Since we require \(\rho (\varvec{y})\geqslant 3\sqrt{d}Ll_N\), we have \(\rho (\varvec{x}_0)\leqslant \frac{\rho (\varvec{y})}{6}+\rho (\varvec{y})=\frac{7}{6}\rho (\varvec{y})\). Therefore,
Proof of Lemma 3
Without loss of generality, we assume that \(|\text {supp}(\rho ) |=1\), then \(\rho =1\) everywhere within its support. We partition \(\text {supp}(\rho )\) into M regions such that each region has probability measure \(\frac{1}{M}\). This partition can be constructed in the following way: for most of the interior of \(\text {supp}(\rho )\), subdivide into hypercubes \(\varOmega _j\)’s of the same size such that \(P_{\varOmega _j} = \frac{1}{M}\) and \(\varOmega _j\)’s are contained completely inside \(\text {supp}(\rho )\). Then the length of the hypercube, l, is determined by \(\frac {l^d}{|\text {supp}(\rho ) |} = \frac {1}{M} \implies l = {\bigg(\frac {1}{{M}^{\frac{1}{d}}}}\bigg)\). For the remaining uncovered regions of \(\text {supp}(\rho )\), cover with some small cubes of appropriate sizes and combine them together to obtain a region with measure \(\frac{1}{M}\).
Then, following a similar idea from Lemma 2, one has a discrete sampling problem with equal probability for each candidate, which resembles the coupon collector problem. The probability p(N, d, M) that each of the M region contains at least one sample point has a well-known lower bound [11]:
With the probability p(N, d, M) given above, for an interior hypercube we again have there is at least one sample in each of its surrounding hypercube, since now there is at least one sample in each of the M region. Then the Voronoi diameter for each point is at most \(3l\sqrt{d}\). Fixing a desired probability \(p_0\), we want to determine the number of regions M to get a control on l. We need to have a bound as follows:
By rearranging, the above equality holds only when
The above equation is solvable by using the Lambert W function:
where \(W_0\) is the principal branch of the Lambert W function. Note that the Lambert W function satisfies
Pluging in the above identity, one has
Also note that the function \(M\text{e}^{-\frac{N}{M}}\) is monotonically increasing in M (for \(M>0\)), so for the bound in (22) to hold we require
By taking the largest possible integer M satisfying the above inequality, we then have
for every hypercube contained in \(\text {supp}(\rho )\). Then this yields a uniform bound for the Voronoi diameter of any point that is in an interior hypercube surrounded by other interior hypercubes:
In terms of the limiting behavior, for large x, the Lambert W function is asymptotic to the following [6, 10]:
with \(c(x)\rightarrow 1\) as \(x\rightarrow \infty\). Therefore, for sufficiently large \(\frac {N} {\left(1-p_0 \right)}\), we have
Proof of Theorem 1
Note that when using one ray: \(S[\varGamma _1](1, j) = \varvec{x}_{1[j]}\) and \(S[\varGamma _2](1, j) = \varvec{x}_{2[j]}\) for \(j=1,2,\cdots ,n_r\). The main idea is to bound the difference between each pair of points using the results introduced in the previous lemmas. Consider a fixed sampling point \(\varvec{r}_{1,j}\in \varvec{r}(s)\) whose corresponding closest points are \(\varvec{x}_{1[j]}\) and \(\varvec{x}_{2[j]}\) in \(\varGamma _1\) and \(\varGamma _2\), respectively. We consider two cases: first when \(\varvec{r}_{1,j}\) is interior to \(\text {supp}(\rho )\), in which case from Lemmas 2 and 3, with probability \(p_0\) we have a bound for the diameter of the Voronoi cell of any interior \(\varvec{x}\), denote it by \(D(\varvec{x})\) where \(\rho ,\,p_0,\, N,\) and d are assumed to be fixed. Therefore,
Then by the triangle inequality: \(\Vert \varvec{x}_{1[j]}-\varvec{x}_{2[j]} \Vert _2 \leqslant D(\varvec{x}_{1[j]})+ D(\varvec{x}_{2[j]})\), which applies for all sampling points \(\varvec{r}_{1,j}\)’s in the interior of \(\text {supp}(\rho )\), and as \(N\rightarrow \infty\) we have \(D\rightarrow 0\) in a rate derived in Lemma 2.
In the case where sampling point \(\varvec{r}_{1,j}\in \varvec{r}(s)\) is outside of \(\text {supp}(\rho )\), since \(\text {supp}(\rho )\) is convex, the closest point to \(\varvec{r}_{1,j}\) from \(\text {supp}(\rho )\) is always unique, denoted by \(\varvec{x}_{\rho }\). Then, choose \(R_1\) depending on \(N_1,\,N_2\) such that the probability measure \(P_1 = P\big (B_{R_1}(\varvec{x}_{\rho })\cap \text {supp}(\rho ) \big )\) achieves the threshold introduced in Lemma 1 so that there is at least \(\varvec{x}_{\rho ,1}\in \varGamma _1\) and \(\varvec{x}_{\rho ,2}\in \varGamma _2\) that lies in \(B_{R_1}(\varvec{x}_{\rho })\cap \text {supp}(\rho )\). For sufficiently large N, \(\varvec{x}_{1[j]}\) and \(\varvec{x}_{2[j]}\) would be points inside \(B_{R_1}(\varvec{x}_{\rho })\cap \text {supp}(\rho )\) since \(\text {supp}(\rho )\) is convex. Then we have
and we can pick \(N_1, N_2\) large to make \(R_1\) as small as desired. Therefore, we have
in probability for both interior sampling points and outer sampling points \(\varvec{r}_{1,j}\), and the convergence starts when \(N_1, N_2\) get sufficiently large. Consequently, for the RaySense matrices \(S[\varGamma _1]\) and \(S[\varGamma _2]\), we can always find N sufficiently large such that
for arbitrarily small \(\varepsilon\) depending on N, \(n_r\), d, and the geometry of \(\text {supp}(\rho )\).
Remark 3
In case of non-convex \(\text {supp}(\rho )\) and the sampling point \(\varvec{r}_{1,j}\in \varvec{r}(s)\) is outside of \(\text {supp}(\rho )\), if the ray \(\varvec{r}\) is drawn from some distribution \({\mathcal {L}}\), with probability one, \(\varvec{r}_{1,j}\) is not equidistant to two or more points on \(\text {supp}(\rho )\), so the closest point is uniquely determined and we only need to worry about the case that \(\varvec{r}_{1,j}\) find the closest point \(\varvec{x}_{2[j]}\) from \(\varGamma _2\) that would be far away from \(\varvec{x}_{1[j]}\).
Let \(\varvec{x}_{\rho }\) be the closest point of \(\varvec{r}_{1,j}\) from \(\text {supp}(\rho )\), similarly, choose \(R_1\) depending on \(N_1,N_2\) such that for balls \(B_{R_1}(\varvec{x}_{\rho , 1})\), the probability measure \(P_1 = P\big (B_{R_1}(\varvec{x}_{\rho })\cap \text {supp}(\rho ) \big )\) achieves the threshold introduced in Lemma 1 so that there is at least \(\varvec{x}_{\rho ,1}\in \varGamma _1\) and \(\varvec{x}_{\rho ,2}\in \varGamma _2\) that lies in \(B_{R_1}(\varvec{x}_{\rho })\cap \text {supp}(\rho )\). Now, consider the case where the closest point \(\tilde{\varvec{x}}_{\rho }\) of \(\varvec{r}_{1,j}\) from the partial support \(\text {supp}(\rho )\setminus B_{R_1}(\varvec{x}_{\rho })\) is far from \(\varvec{x}_{\rho }\) due to the non-convex geometry, and denote
We pick N so large that
implying we can find \(\varvec{x}_{\rho ,1}, \varvec{x}_{\rho ,2}\) from \(\varGamma _1\) and \(\varGamma _2\) closer than \(\varvec{x}_{\rho ,2}\) from the continuum. Therefore, for sufficiently large \(N_1\) and \(N_2\), we can find both closest points \(\varvec{x}_{1[j]}\), \(\varvec{x}_{2[j]}\) of \(\varvec{r}_{1,j}\) inside \(B_{R_1}(\varvec{x}_{\rho , 1})\cap \text {supp}(\rho )\) from \(\varGamma _1\) and \(\varGamma _2\), \(\implies \Vert \varvec{x}_{1[j]}-\varvec{x}_{2[j]} \Vert _2 \leqslant 2R_1\). The rest follows identically as in the previous proof.
Appendix D Details of the Proof of Theorem 3
Before deriving the result, we first take a detour to investigate the problem under the setting of the Poisson point process, as a means of generating points in a uniform distribution.
1.1 Appendix D.1 Poisson Point Process
A Poisson point process [9] on \(\varOmega\) is a collection of random points such that the number of points \(N_{\varOmega '}\) in any bounded measurable subsets \(\varOmega _j\) with measure \(\mu (\varOmega _j)\) is a Poisson random variable with rate \(\lambda |\varOmega _j |\) such that \(N_j\sim {\text {Poi}}(\lambda |\varOmega _j |)\). In other words, we take N, instead of being fixed, to be a random Poisson variable: \(N\sim {\text {Poi}}(\lambda )\), where the rate parameter \(\lambda\) is a constant. Therefore, the underlying Poisson process is homogeneous and it also enjoys the complete independence property, i.e., the number of points in each disjoint and bounded subregion will be completely independent of all the others.
What follows naturally from these properties is that the spatial locations of points generated by the Poisson process is uniformly distributed. As a result, each realization of the homogeneous Poisson process is a uniform sampling of the underlying space with number of points \(N\sim {\text {Poi}}(\lambda )\).
Below we state a series of useful statistical properties and concentration inequality for the Poisson random variable.
-
The Poisson random variable \(N\sim {\text {Poi}}(\lambda )\) has mean and variance both \(\lambda\):
$$\begin{aligned}{\mathbb {E}}(N) = Var(N) = \lambda .\end{aligned}$$ -
The corresponding probability density function is
$$\begin{aligned} {\mathbb {P}}({N=k}) = \frac{\text{e}^{-\lambda }\lambda ^k}{k!}.\end{aligned}$$ -
A useful concentration inequality [43] (N scales linearly with \(\lambda\)):
$$\begin{aligned} {\mathbb {P}}(N\leqslant \lambda -\varepsilon ) \leqslant \text{e}^{-\frac{\varepsilon ^2}{2(\lambda +\varepsilon )}}\; \text { or } \;{\mathbb {P}}(|N-\lambda |\geqslant \varepsilon ) \leqslant 2 \text{e}^{-\frac{\varepsilon ^2}{2(\lambda +\varepsilon )}}. \end{aligned}$$(23)
Furthermore, one can also derive a Markov-type inequality for the event a Poisson random variable \(N\sim {\text {Poi}}(\lambda )\) is larger than some \(a>\lambda\) that is independent of \(\lambda\), different from (23).
Proposition 2
For Poisson random variable \(N\sim {\text {Poi}}(\lambda )\), it satisfies the following bound for any constant \(a>\lambda\):
Proof
By Markov’s inequality:
To get a tighter bound, we want to minimize the R.H.S.. Let \(\zeta =\text{e}^t>1\). Then we minimize the R.H.S. over \(\zeta\):
A simple derivative test yields the global minimizer \(\zeta = \frac{a}{\lambda }>1\) since we require \(a>\lambda\). Thus,
A direct consequence of (23) is that one can identify N with \(\lambda\) with high probability when \(\lambda\) is large, or equivalently the other way around.
Lemma 4
(Identify N with \(\lambda\)) A point set of cardinality \(N^*\) drawn from a uniform distribution, with high probability, can be regarded as a realization of a Poisson point process with rate \(\lambda\) such that
Proof
If \(N\sim {\text {Poi}}(\lambda )\), by taking \(\varepsilon = \frac{\lambda}{2}\), from (23) we have
Let \(\lambda _u = 2N^*\) as a potential upper bound for \(\lambda\), while \(\lambda _l = \frac{2N^*}{3}\) the potential lower bound. Then for \(N_u\sim {\text {Poi}}(\lambda _u)\) and \(N_l\sim {\text {Poi}}(\lambda _l)\):
Therefore, if we have some other Poisson processes with rate \(\lambda _1 >\lambda _u\), and \(\lambda _2<\lambda _l\) the probabilities of the corresponding Poisson variables \(N_1\sim {\text {Poi}}(\lambda _1), N_2\sim {\text {Poi}}(\lambda _2)\) to achieve at most (or at least) \(N^*\) is bounded by
Note that both of the events have a probability decaying to 0 as the observation \(N^* \rightarrow \infty\), therefore we have a confidence interval of left margin \(\text{e}^{-\frac{N^*}{6}}\) and right margin \(\text{e}^{-\frac{N^*}{18}}\) to conclude that the Poisson parameter \(\lambda\) behind the observation \(N^*\) has the bound
Since the margins shrink to 0 as \(N^* \rightarrow \infty\), we can identify \(\lambda\) as cN with some constant c around 1 with high probability.
By Lemma 4, for the remaining, we will approach the proof to Theorem 3 from a Poisson process perspective and derive results with the Poisson parameter \(\lambda\).
1.2 Appendix D.2 Main Ideas of the Proof of Theorem 3
Consider a Poisson process with parameter \(\lambda\) in \(\text {supp}(\rho )\) and a corresponding point cloud \(\varGamma\) with cardinality \(N\sim {\text {Poi}}(\lambda )\). Based on previous discussion from Sect. 3.1.3, we assume the ray \(\varvec{r}(s)\) is given entirely in the interior of \(\text {supp}(\rho )\). From Theorem 2, by denoting \(1\leqslant k_i\leqslant N\) such that \(\{\varvec{x}_{k_i}\}_{i=1}^M\subset \varGamma\) are points in \(\varGamma\) sensed by the ray and \(V_{k_i}:= V(\varvec{x}_{k_i})\), equivalently the line integral error is
To bound the above quantity, one needs to bound M the number of Voronoi cells a line goes through, the length of \(\varvec{r}(s)\) staying inside \(V_{k_i}\) and the distance to the corresponding \(\varvec{x}_{k_i}\) for each \(\varvec{r}(s)\) altogether. Our key intuition is stated as follows.
Divide the ray \(\varvec{r}(s)\) of length 1 into segments each of length h, and consider a hypercylinder of height h and radius h centering around each segment. If there is at least one point from \(\varGamma\) in each of the hypercylinders, then no point along \(\varvec{r}(s)\) will have its nearest neighbor further than distance \(H = \sqrt{2}h\) away from \(\varvec{r}(s)\). Therefore, we restrict our focus to \(\varOmega\), a tubular neighborhood of distance H around \(\varvec{r}(s)\)-that is, a “baguette-like” region with spherical end caps. \(N_{\varOmega }\), the number of points of \(\varGamma\) that are inside \(\varOmega\), will serve as an upper bound for M (the total number of unique nearest neighbors of \(\varvec{r}(s)\) in \(\varGamma\)) while the control of the other two quantities (intersecting length and distances to closest points) comes up naturally.
Undoubtedly M depends on the size of \(\varOmega\), which is controlled by h. The magnitude of h therefore becomes the crucial factor we need to determine. The following lemma motivates the choice of \(h=\lambda ^{\!-\frac{1}{d}+\varepsilon }\) for some small \(1\gg \varepsilon >0\).
Lemma 5
Under A1–A7, for a point cloud of cardinality \(N\sim {\text {Poi}}(\lambda )\) generated from a Poisson point process, and a ray \(\varvec{r}(s)\) given entirely in \(\text {supp}(\rho )\), the number of points \(N_\varOmega\) in the tubular neighborhood of radius \(H=\sqrt{2}h\) around \(\varvec{r}(s)\) will be bounded when
for some small \(1\gg \varepsilon >0\), with probability \(\rightarrow 1\) as \(\lambda \rightarrow \infty\).
Proof
Note that the baguette region \(\varOmega\) has outer radius H, and hypercylinders of radius h are contained inside \(\varOmega\). For simplicity we prescribe h such that \(Q = \frac {1} {h}\) is an integer, then the baguette region \(\varOmega\) consists of Q number of hypercylinders, denoted by \(\{\varOmega _j\}_{j=1}^Q\) and the remaining region, denoted by \(\varOmega _r\) consisting of an annulus of outer radius H, inner radius h, and two half spheres of radius H on each side. Since each region is disjoint, according to Appendix D.1 the Poisson process with rate \(\lambda\) in \(\text {supp}(\rho )\) will have Poisson sub-process in each of the regions in a rate related to their Lesbegue measure, and all the sub-processes are independent.
Now, let \({\mathbb {P}}_Q\) denote the probability of having at least one point in each \(\varOmega _j\) in \(\{\varOmega _j\}_{j=1}^Q\) while the number of points in each \(\varOmega _j\) is also uniformly bounded by some constant \(N_Q\). Since each \(\varOmega _j\) has the same measure, their corresponding Poisson processes have the identical rate \(\lambda _q = |\varOmega _1 |\lambda\). Let \(N_j\) denote the Poisson random variable for \(\varOmega _j.\) Then,
Combined with (24) by requiring \(N_Q>\lambda _q\), this implies
and hence
The measure of the remaining region \(\varOmega _r\) is \(|\varOmega _r | = \omega _{d}H^{d} + \omega _{d-1}(H^{d-1}-h^{d-1})\), where \(\omega _d\) is the volume of the unit d-sphere. Therefore the Poisson process on \(\varOmega _r\) has rate \(\lambda _r = |\varOmega _r |\lambda\). Let \(N_r\) denote the corresponding Poisson random variable, again by (24) with \(N'>\lambda _r\):
Since \(\varOmega _r\) and \(\bigcup \{\varOmega _j\}_{j=1}^Q\) are disjoint, by independence, the combined probability \(p_\textrm{tot}\) that all these events happen:
-
(i)
the number of points \(N_j\) in each hypercylinder \(\varOmega _j\) is at least 1,
-
(ii)
\(N_j\) is uniformly bounded above by some constant \(N_Q,\)
-
(iii)
the number of points \(N_r\) in the remaining regions \(\varOmega _r = \varOmega -\cup _j \{\varOmega _j\}\) is also bounded above by some constant \(N',\)
would have the lower bound:
Then with probability \(p_\textrm{tot}\), we have an upper bound for \(N_{\varOmega }\), the total number of points in \(\varOmega\):
Apparently \(N_{\varOmega }\) and \(p_\textrm{tot}\) are inter-dependent: as we restrict the R.H.S. bound in (29) by choosing a smaller \(N'\) or \(N_Q\), the bound for \(p_\textrm{tot}\) will be loosened. From Lemma 4, we set \(N' = \alpha \lambda _r\), \(N_Q = \beta \lambda _q\) for some \(\alpha ,\beta >1\). Therefore, the next step is to determine the parameter set \(\{h,\,\alpha ,\,\beta \}\) to give a more balanced bound to the R.H.S. in (29) while still ensuring the probability of undesired events will have exponential decay.
For that purpose we need some optimization. We know
We need to investigate how h should scale with \(\lambda\), so we assume \(h\sim \lambda ^{-p}\) for some constant p to be determined. The following optimization procedure provides some motivations for choosing p. On the one hand, for the constraints we need to ensure that the probability of each of the three events above not occurring decays to 0 as \(\lambda \rightarrow \infty\)
and representing all the quantities in terms of \(\lambda ,\, p,\, \alpha ,\,\beta\) and simplifying
On the other hand, for the objective, note that
since \(\alpha ,\,\beta\) are just some constants \(>1\), fixing \(\alpha\) and \(\beta\) so that \(h = \lambda ^{-p}\) and we want to minimize h to obtain an upper bound for the total number of points in \(\varOmega\):
Combined with bounds derived from the constraints, to minimize \((1-(d-1)p)\), we need to maximize p, therefore we take \(p = \frac{1}{d}-\varepsilon\) for an infinitesimal \(\varepsilon >0\).
1.3 Appendix D.3 Proof of Theorem 3
Proof of Theorem 3
Consider a Poisson process with rate \(\lambda\) on the \(\text {supp}(\rho )\). As in Appendix D.2, let \(Q = \frac {1} {h}\) be an integer for simplicity (or take ceiling if desired), and consider Q hypercylinders of radius h centered along \(\varvec{r}(s)\). Again as in Appendix D.2, let \(\varOmega\) be the tubular neighborhood of distance \(H = \sqrt{2}h\) around \(\varvec{r}(s)\). Motivated by Appendix D.2, we set
for some small constants \(1\gg \varepsilon >0\) to be determined.
Divide the tubular neighborhood \(\varOmega\) into two parts, one consists of the set of hypercylinders \(\bigcup _{j=1}^Q \varOmega _j\) around \(\varvec{r}(s)\), the other is the remainder \(\varOmega _r\). From the setting of Lemma 5, let \(N' = \alpha \lambda _r\) be the number of points in \(\varOmega _r\) while \(N_Q = \beta \lambda _q\) is for \(\varOmega _j\), and we set \(\alpha =\beta = \text {e} > 1\) (also satisfying the constraints in Lemma 5) to simplify the calculations so that we have \(N_Q = \text{e}\lambda _q, N_r = \text {e}\lambda _r\), and equation (27) becomes
So the total number in \(\bigcup _{j=1}^Q \varOmega _j\) is bounded by \(Q \text{e}\lambda _q\) while there is still at least one point in every \(\varOmega _j\) with the above probability. On the other hand for (28),
Again by the same independence argument, the total probability that all the events happen has the following lower bound:
And the total number of points inside \(\varOmega\) is bounded by
Finally, when there is at least one point in each of \(\varOmega _j\), the maximum distance from any point on \(\varvec{r}(s)\) to its nearest neighbor is given by \(H=\sqrt{2}h\) as we have argued. Furthermore, under this setting, for any of the potential nearest neighbors, the maximum length that \(\varvec{r}(s)\) intersect its Voronoi cell has an upper bound of 3h. Therefore, the line integral error (26) is bounded by
Finally, for the total probability \(p_\textrm{tot}\):
and recall from (30): \(\lambda _q = |\varOmega _1 |\lambda = \omega _{d-1}h^{d-1}h\lambda = \omega _{d-1}h^d\lambda = \omega _{d-1}\lambda ^{\varepsilon d}\). Then
The above convergence can be shown by taking the natural log:
since \(\ln {\lambda }\) grows slower than \(\lambda ^{\varepsilon }\) for any \(\varepsilon >0\). As for the last term \(\text{e}^{-|\varOmega _r |\lambda }\):
Thus, the probability \(p_\textrm{tot}\rightarrow 1\) as \(\lambda \rightarrow \infty\), and we have our line integral error \(\leqslant c(d,J) \lambda ^{-\frac{1}{d}+\varepsilon (d+1)} \rightarrow 0\) as long as \(\varepsilon <\frac{1}{(d+1)^2}\). To obtain the convergence in terms of the actual number of points N in the point cloud, we invoke Lemma 4 and set \(N=c\lambda\) to conclude the proof.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, L., Ly, L., Macdonald, C.B. et al. Nearest Neighbor Sampling of Point Sets Using Rays. Commun. Appl. Math. Comput. (2023). https://doi.org/10.1007/s42967-023-00318-1
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42967-023-00318-1