Skip to main content
Log in

Isolation Kernel Estimators

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Existing adaptive kernel density estimators (KDEs) and kernel regressions (KRs) often employ a data-independent kernel, such as Gaussian kernel. They require an additional means to adapt the kernel bandwidth locally in a given dataset in order to produce better estimations. But this comes with high computational cost. In this paper, we show that adaptive KDEs and KRs can be directly derived from Isolation Kernel with constant-time complexity for each estimation. The resultant estimators called IKDE and IKR are the first KDE and KR that are fast and adaptive. We demonstrate both the superior efficiency and efficacy of IKDE and IKR in anomaly detection and regression tasks, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. This is despite the fact that all adapted bandwidths can be found in the preprocessing step using the second approach.

  2. We identify that there is a typo in the original equation \(\sigma _x = \min (mean_{y \in kNN(x)} \parallel x-y \parallel , \varepsilon )\) in paper [13], i.e., the min function should be the max function; otherwise, most points would have the same bandwidth of \(\varepsilon \).

References

  1. Terrell GR, Scott DW (1992) Variable kernel density estimation. Ann Stat 20(3):1236–1265

    Article  MATH  Google Scholar 

  2. Rosenblatt M (1956) Remarks on some nonparametric estimates of a density function. Ann Math Stat 27(3):832–837

    Article  MATH  Google Scholar 

  3. Gentleman WM, Sande G (1966) Fast Fourier transforms—for fun and profit. In: Proceedings of AFIPS fall joint computer conference, pp 563–578

  4. Scott DW (1981) Using computer-binned data for density estimation. In: Proceedings of the 13th symposium on the interface of computer science and statistics. Springer, New York, pp 292–294

  5. Silverman BW (1982) Algorithm as 176: kernel density estimation using the Fast Fourier Transform. J Roy Stat Soc: Ser C (Appl Stat) 31(1):93–99

    MATH  Google Scholar 

  6. Wand MP (1994) Fast computation of multivariate kernel estimators. J Comput Graph Stat 3(4):433–445

    Google Scholar 

  7. O’Brien TA, Kashinath K, Cavanaugh NR, Collins WD, O’Brien JP (2016) A fast and objective multidimensional kernel density estimation method: fastKDE. Comput Stat Data Anal 101:148–160

    Article  MATH  Google Scholar 

  8. Scott DW (2015) Multivariate density estimation: theory, practice, and visualization, 2nd edn. John Wiley & Sons Inc, Hoboken

    MATH  Google Scholar 

  9. Williams CKI, Seeger M (2001) Using the Nyström method to speed up kernel machines. In: Leen TK, Dietterich TG, Tresp V (eds) Advances in neural information processing systems, vol 13, pp 682–688

  10. Luo C, Shrivastava A (2018) Arrays of (locality-sensitive) count estimators (ACE): high-speed anomaly detection via cache lookups. In: Proceedings of the world wide web conference, pp 1439–1448

  11. Coleman B, Shrivastava A (2020) Sub-linear RACE sketches for approximate kernel density estimation on streaming data. In: Proceedings of the world wide web conference, pp 1739–1749

  12. Backurs A, Indyk P, Wagner T (2019) Space and time efficient kernel density estimation in high dimensions. In: Advances in neural information processing systems, vol 32, pp 15799–15808

  13. Schubert E, Zimek A, Kriegel H-P (2014) Generalized outlier detection with flexible kernel density estimates. In: Proceedings of the SIAM conference on data mining, pp 542–550

  14. Ting KM, Zhu Y, Zhou Z-H (2018) Isolation kernel and its effect on SVM. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 2329–2337

  15. Watson GS (1964) Smooth regression analysis. Sankhyā: Indian J Stat Ser A 26(4):359–372

    MATH  Google Scholar 

  16. Sain SR (2002) Multivariate locally adaptive density estimation. Comput Stat Data Anal 39:165–186

    Article  MATH  Google Scholar 

  17. Zhang L, Lin J, Karim R (2018) Adaptive kernel density-based anomaly detection for nonlinear systems. Knowl Based Syst 139:50–63

    Article  Google Scholar 

  18. Phillips JM, Tai WM (2020) Near-optimal coresets of kernel density estimates. Discret. Comput. Geom. 63:867–887

    Article  MATH  Google Scholar 

  19. Chen Y, Welling M, Smola A (2010) Super-samples from kernel herding. In: Proceedings of the 26th conference on uncertainty in artificial intelligence, pp 109–116

  20. Cortes EC, Scott C (2017) Sparse approximation of a kernel mean. IEEE Trans Signal Process 65(5):1310–1323

    Article  MATH  Google Scholar 

  21. Siminelakis P, Rong K, Bailis P, Charikar M, Levis P (2019) Rehashing kernel evaluation in high dimensions. In: Proceedings of the 36th international conference on machine learning, pp 5789–5798

  22. Charikar MS (2002) Similarity estimation techniques from rounding algorithms. In: Proceedings of the 34th ACM symposium on theory of computing, pp 380–388

  23. Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the 20th annual symposium on computational geometry, pp 253–262

  24. Charikar M, Siminelakis P (2017) Hashing-based-estimators for kernel density in high dimensions. In: 2017 IEEE 58th annual symposium on foundations of computer science (FOCS). IEEE, pp 1032–1043

  25. Qin X, Ting KM, Zhu Y, Lee VCS (2019) Nearest-neighbour-induced isolation similarity and its impact on density-based clustering. In: Proceedings of the thirty-third AAAI conference on artificial intelligence, pp 4755–4762

  26. Aurenhammer F (1991) Voronoi diagrams—a survey of a fundamental geometric data structure. ACM Comput Surv 23(3):345–405

    Article  Google Scholar 

  27. Alishahi K, Sharifitabar M (2008) Volume degeneracy of the typical cell and the chord length distribution for Poisson-Voronoi tessellations in high dimensions. Adv Appl Probab 40(4):919–938

    Article  MATH  Google Scholar 

  28. Calka P, Chenavier N (2014) Extreme values for characteristic radii of a Poisson-Voronoi tessellation. Extremes 17(3):359–385

    Article  MATH  Google Scholar 

  29. Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, London

    MATH  Google Scholar 

  30. Ting KM, Xu B-C, Washio T, Zhou Z-H (2020) Isolation distributional kernel: a new tool for kernel based anomaly detection. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery and data mining, pp 198–206

  31. Kim J, Scott CD (2012) Robust kernel density estimation. J Mach Learn Res 13(82):2529–2565

    MATH  Google Scholar 

  32. Musco C, Musco C (2017) Recursive sampling for the Nyström method. In: Advances in neural information processing systems, pp 3833–3845

  33. Schölkopf B, Platt JC, Shawe-Taylor JC, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471

    Article  MATH  Google Scholar 

  34. Liu FT, Ting KM, Zhou Z-H (2008) Isolation Forest. In: Proceedings of the IEEE international conference on data mining, pp 413–422

  35. Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 93–104

  36. Nadaraya EA (1964) On estimating regression. Theory Probab Appl 9(1):141–142

    Article  MATH  Google Scholar 

  37. Noda K (1976) Estimation of a regression function by the Parzen kernel-type density estimators. Ann Inst Stat Math 28(1):221–234

    Article  MATH  Google Scholar 

  38. Tosatto S, Akrour N, Peters J (2020) An upper bound of the bias of Nadaraya–Watson kernel regression under Lipschitz assumptions. Stats 4(1):1–17

    Article  Google Scholar 

  39. Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml

  40. Andoni A, Razenshteyn I (2015) Optimal data-dependent hashing for approximate near neighbors. In: Proceedings of the forty-seventh annual ACM symposium on theory of computing, pp 793–801

  41. Andoni A, Laarhoven T, Razenshteyn I, Waingarten E (2017) Optimal hashing-based time-space trade-offs for approximate near neighbors. In: Proceedings of the twenty-eighth annual ACM-SIAM symposium on discrete algorithms, pp 47–66

  42. Lever G, Diethe T, Shawe-Taylor J (2012) Data dependent kernels in nearly-linear time. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 685–693

  43. Ionescu C, Popa A, Sminchisescu C (2017) Large-scale data-dependent kernel approximation. In: Proceedings of the 20th international conference on artificial intelligence and statistics, pp 19–27

  44. Muandet K, Fukumizu K, Sriperumbudur B, Schölkopf B (2017) Kernel mean embedding of distributions: a review and beyond. Found Trends Mach Learn 10(1–2):1–141

    Article  MATH  Google Scholar 

  45. Ting KM, Washio T, Wells JR, Zhang H (2021) Isolation kernel density estimation. In: Proceedings of IEEE international conference on data mining, pp 619–628

Download references

Acknowledgements

The preliminary version of this paper was reported in 2021 IEEE International Conference on Data Mining [45], and it presents the work on kernel density estimation only. Kai Ming Ting was supported by Natural Science Foundation of China (62076120).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kai Ming Ting.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Experimental settings

The experiments ran on a Linux CPU machine: AMD 16-core CPU with each core running at 2 GHz and 32 GB RAM. The IK feature mapping ran on a machine having GPU: 2 x GTX 1080 Ti with 3584 (1.6 GHz) CUDA cores & 12 GB graphic memory and CPU: i9-7900X 3.30 GHz processor (20 cores), 64 GB RAM. Only the feature mapping of IK was conducted in GPU. Other processes in IKDE were run in CPU.

We have used the following codes:

* Isolation Kernel:

https://github.com/IsolationKernel/Codes/tree/main/IDK/IKDE to build IKDE;

* LOF:

https://github.com/scikit-learn/scikit-learn/blob/15a949460/sklearn/neighbors/_lof.py.

* Other codes are from the respective authors.

The reported results for randomized algorithms, IKDE, RACE and HBE are averaged over five trials. The results of the deterministic algorithms, e.g., OKDE and KDEOS, are obtained from a single trial.

All datasets are normalized to [0, 1] in the preprocessing.

Sources of datasets used for anomaly detection are:

* https://lapad-web.icmc.usp.br/repositories/outlier-evaluation/DAMI,

* https://odds.cs.stonybrook.edu, https://sites.google.com/site/gspangsite,

* https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.

The parameter search ranges are shown in Table 6.

Table 6 Parameter search ranges

Appendix B: Proof of Theorem 1

Proof

Since all \(y \in Y\) is \(y \sim {\mathcal {P}}_Y\), we derive the following expectations.

$$\begin{aligned} {{\mathbb {E}}}_{\rho _Y}[{\bar{\kappa }}_{\psi }( x, y\vert U)]&= \int _{{\mathcal {X}}_Y} \rho _Y(y) \frac{\psi }{ \vert {\mathcal {X}}_U \vert } \sum _{H \in \mathbb {H}_\psi (U)} p(H) \int _{\theta [z \vert x]} \delta (r-y)\,\mathrm{d}r\,\mathrm{d}y\\&= \frac{\psi }{ \vert {\mathcal {X}}_U \vert } \sum _{H \in \mathbb {H}_\psi (U)} p(H) \int _{{\mathcal {X}}_Y} \rho _Y(y) \int _{\theta [z \vert x]} \delta (r-y)\,\mathrm{d}r\,\mathrm{d}y\\&= \frac{\psi }{ \vert {\mathcal {X}}_U \vert } \sum _{H \in \mathbb {H}_\psi (U)} p(H) \int _{\theta [z \vert x]} \rho _Y(y)\,\mathrm{d}y\\ {{\mathbb {E}}}_{\rho _Y}[{\bar{\kappa }}_{\psi }( x, y\vert U)^2]&= \int _{{\mathcal {X}}_Y} \rho _Y(y) \left\{ \frac{\psi }{ \vert {\mathcal {X}}_U \vert } \sum _{H \in \mathbb {H}_\psi (U)} p(H) \int _{\theta [z \vert x]} \delta (r-y)\,\mathrm{d}r \right\} ^2\,\mathrm{d}y\\&= \frac{\psi ^2}{ \vert {\mathcal {X}}_U \vert ^2} \sum _{H \in \mathbb {H}_\psi (U)} p(H) \int _{{\mathcal {X}}_Y} \rho _Y(y) \left( \int _{\theta [z \vert x]} \delta (r-y)dr\right) ^2\,\mathrm{d}y \\&\quad + \frac{2\psi ^2}{ \vert {\mathcal {X}}_U \vert ^2} \sum _{H,H' \in \mathbb {H}_\psi (U)} p(H)p(H')\\&\quad \times \int _{{\mathcal {X}}_Y} \rho _Y(y) \int _{\theta [z \vert x]} \delta (r-y)\,\mathrm{d}r\int _{\theta '[z \vert x]} \delta (r'-y)\,\mathrm{d}r'\,\mathrm{d}y \\&= \frac{\psi ^2}{ \vert {\mathcal {X}}_U \vert ^2} \sum _{H \in \mathbb {H}_\psi (U)} p(H) \int _{{\mathcal {X}}_Y} \rho _Y(y) \int _{\theta [z \vert x]} \delta (r-y)\,\mathrm{d}r\,\mathrm{d}y \\&\quad + \frac{2\psi ^2}{ \vert {\mathcal {X}}_U \vert ^2} \sum _{H,H' \in \mathbb {H}_\psi (U)} p(H)p(H') \int _{{\mathcal {X}}_Y} \rho _Y(y) \int _{\theta [z \vert x] \cap \theta '[z' \vert x]} \delta (r-y)\,\mathrm{d}r\,\mathrm{d}y \\&= \frac{\psi ^2}{ \vert {\mathcal {X}}_U \vert ^2} \sum _{H \in \mathbb {H}_\psi (U)} p(H) \int _{\theta [z \vert x]} \rho _Y(y)\,\mathrm{d}y\\&\quad + \frac{2\psi ^2}{ \vert {\mathcal {X}}_U \vert ^2} \sum _{H,H' \in \mathbb {H}_\psi (U)} p(H)p(H') \int _{\theta [z \vert x] \cap \theta '[z' \vert x]} \rho _Y(y)\,\mathrm{d}y \end{aligned}$$

By Lipschitz continuity of \(\rho _Y(y)\) and \(x,y \in \theta [z]\), the following relation holds in \(\theta [z]\).

$$\begin{aligned} \rho _Y(y)-\rho _Y(x) \le \vert \rho _Y(y)-\rho _Y(x) \vert \le L_Y \vert y-x \vert \le 2 L_Y R_{{\textit{max}}}(\psi ). \end{aligned}$$
(B1)

These expressions give the following bias and variance of \({\bar{f}}_{\psi }(x \vert Y)\).

$$\begin{aligned} {\textit{Bias}}_\psi [{\bar{f}}_{\psi }(x \vert Y)]&= {{\mathbb {E}}}_{\rho _Y}[{\bar{\kappa }}_{\psi }( x, y\vert U)] - \rho _Y(x)\\&= {{\mathbb {E}}}_{\rho _Y}[{\bar{\kappa }}_{\psi }( x, y\vert U)] - \frac{\psi }{ \vert {\mathcal {X}}_U \vert }{{\mathbb {E}}}_{\mathbb {H}_\psi (U)}[\ \left| \theta [z \vert x]\right| ]\rho _Y(x)\\&= \frac{\psi }{ \vert {\mathcal {X}}_U \vert } \sum _{H \in \mathbb {H}_\psi (U)} p(H) \int _{\theta [z \vert x]} (\rho _Y(y) - \rho _Y(x))\,\mathrm{d}y\\&\le \frac{\psi }{ \vert {\mathcal {X}}_U \vert } \sum _{H \in \mathbb {H}_\psi (U)} p(H) \int _{\theta [z \vert x]} 2 L_Y R_{{\textit{max}}}(\psi )\,\mathrm{d}r\\&= \frac{2 L_Y \psi }{ \vert {\mathcal {X}}_U \vert } \sum _{H \in \mathbb {H}_\psi (U)} p(H) R_{{\textit{max}}}(\psi ) \left| \theta [z \vert x]\right| \\&\le \frac{2 L_Y \psi }{ \vert {\mathcal {X}}_U \vert } {{\mathbb {E}}}_{\mathbb {H}_\psi (U)}[R_{{\textit{max}}}(\psi )] {{\mathbb {E}}}_{\mathbb {H}_\psi (U)}[\left| \theta [z \vert x]\right| ]\\&= 2 L_Y {{\mathbb {E}}}_{\mathbb {H}_\psi (U)}[R_{{\textit{max}}}(\psi )]. \end{aligned}$$

The second line is from (5) and the homogeneity of \(\theta [z]\), the fourth line is from (B1), and the last two lines are from Cauchy–Schwarz inequality and (5).

$$\begin{aligned} {\textit{Var}}_\psi [{\bar{f}}_{\psi }(x \vert Y)]&= \frac{1}{ n } ({{\mathbb {E}}}_{\rho _Y}[{\bar{\kappa }}_{\psi }( x, y\vert U)^2] - {{\mathbb {E}}}_{\rho _Y}[{\bar{\kappa }}_{\psi }( x, y\vert U)]^2)\\&\le \frac{1}{ n } {{\mathbb {E}}}_{\rho _Y}[{\bar{\kappa }}_{\psi }( x, y\vert U)^2]\\&\le \frac{\psi ^2}{ n \vert {\mathcal {X}}_U \vert ^2} \sum _{H \in \mathbb {H}_\psi (U)} p(H) \int _{\theta [z \vert x]} (\rho _Y(x) + 2 L_Y R_{{\textit{max}}}(\psi ))\,\mathrm{d}y \\&\quad + \frac{2\psi ^2}{ n \vert {\mathcal {X}}_U \vert ^2} \sum _{H,H' \in \mathbb {H}_\psi (U)} p(H)p(H')\\&\quad \times \int _{\theta [z \vert x] \cap \theta '[z' \vert x]} (\rho _Y(x) + 2 L_Y R_{{\textit{max}}}(\psi ))\,\mathrm{d}y\\&= \frac{\rho _Y(x)\psi ^2}{ n \vert {\mathcal {X}}_U \vert ^2} {{\mathbb {E}}}_{\mathbb {H}_\psi (U)}[\left| \theta [z \vert x]\right| ] \\&\quad + \frac{2\rho _Y(x)\psi ^2}{ n \vert {\mathcal {X}}_U \vert ^2} {{\mathbb {E}}}_{\mathbb {H}_\psi (U) \times \mathbb {H}_\psi (U)}[\left| \theta [z \vert x] \cap \theta '[z' \vert x]\right| ]\\&\quad + \frac{2 L_Y \psi ^2}{ n \vert {\mathcal {X}}_U \vert ^2} {{\mathbb {E}}}_{\mathbb {H}_\psi (U)}[R_{{\textit{max}}}(\psi )\left| \theta [z \vert x]\right| ]\\&\quad + \frac{4 L_Y \psi ^2}{ n \vert {\mathcal {X}}_U \vert ^2} {{\mathbb {E}}}_{\mathbb {H}_\psi (U) \times \mathbb {H}_\psi (U)}[R_{{\textit{max}}}(\psi )\left| \theta [z \vert x] \cap \theta '[z' \vert x]\right| ]\\&\le \frac{\rho _Y(x)\psi ^2}{ n \vert {\mathcal {X}}_U \vert ^2} {{\mathbb {E}}}_{\mathbb {H}_\psi (U)}[\left| \theta [z \vert x]\right| ] \\&\quad + \frac{2\rho _Y(x)\psi ^2}{ n \vert {\mathcal {X}}_U \vert ^2} {{\mathbb {E}}}_{\mathbb {H}_\psi (U) \times \mathbb {H}_\psi (U)}[\left| \theta [z \vert x] \cap \theta '[z' \vert x]\right| ]\\&\quad + \frac{2 L_Y \psi ^2}{ n \vert {\mathcal {X}}_U \vert ^2} {{\mathbb {E}}}_{\mathbb {H}_\psi (U)}[R_{{\textit{max}}}(\psi )]{{\mathbb {E}}}_{\mathbb {H}_\psi (U)}[\left| \theta [z \vert x]\right| ]\\&\quad + \frac{4 L_Y \psi ^2}{ n \vert {\mathcal {X}}_U \vert ^2} {{\mathbb {E}}}_{\mathbb {H}_\psi (U)}[R_{{\textit{max}}}(\psi )]{{\mathbb {E}}}_{\mathbb {H}_\psi (U) \times \mathbb {H}_\psi (U)}[\left| \theta [z \vert x] \cap \theta '[z' \vert x]\right| ]\\&= \frac{\psi +2}{ n \vert {\mathcal {X}}_U \vert } (\rho _Y(x) + 2 L_Y {{\mathbb {E}}}_{\mathbb {H}_\psi (U)}[R_{{\textit{max}}}(\psi )]). \end{aligned}$$

The inequality of the third to fifth lines is from (B1). The last two lines are from Cauchy–Schwarz inequality, (5) and Proposition 1.

By Proposition 2, \({\textit{Bias}}_\psi [{\bar{f}}_{\psi }(x \vert Y)] \rightarrow 0\) if \(\psi \rightarrow \infty \). Accordingly, both \({\textit{Bias}}_\psi [{\bar{f}}_{\psi }(x \vert Y)]\) and \({\textit{Var}}_\psi [{\bar{f}}_{\psi }(x \vert Y)]\) converge on 0 if \( n \rightarrow \infty \), \(\psi \propto n^\alpha \) and \(0<\alpha <1\). \(\square \)

Appendix C: Analysis of consistency of IKR \(_V\)

The Nadaraya–Watson-type kernel regression for one-dimensional data space is known to be consistent [37] if its kernel function \(\kappa (x,y)\) is finite, absolutely integrable and normalized and satisfies:

$$\begin{aligned} \lim _{\vert x-y \vert \rightarrow \infty } \vert (x-y)\kappa (x,y) \vert = 0, \end{aligned}$$
(C2)

and its kernel bandwidth h depends on the size n of the dataset W as

$$\begin{aligned} \lim _{n \rightarrow \infty } h(n) = 0, \end{aligned}$$
(C3)

and

$$\begin{aligned} \lim _{n \rightarrow \infty } n h(n) = \infty . \end{aligned}$$
(C4)

This consistency of the kernel regression holds for both fixed and variable kernel bandwidths as far as h satisfies the last two conditions. All of GKR, GKR\(_N\) and GKROS satisfy these conditions, and thus, they are consistent.

The Voronoi diagram-based Isolation Kernel Regression (IKR\(_V\)) \(\kappa _\psi (x,y \vert U)\), shown in Eq. (4), is finite and absolutely integrable and can be normalized. When we chose \(\psi \propto n^\alpha \) (\(0<\alpha <1\)) as stated in Theorem 1, the upper bound of a Voronoi partition’s half size, i.e., \(R_{{\textit{max}}}(\psi )\) on the one-dimensional data space is probabilistically bounded for large n. That is, for large \(\psi \), Eqs. (6) and (7) give

$$\begin{aligned} \forall \epsilon > 0, P(R_{{\textit{max}}}(\psi ) \le \epsilon ) \simeq e^{-c \psi e^{- \psi \epsilon }}. \end{aligned}$$

Equation (2) means \(\kappa _\psi (x,y \vert U) = P(x,y \in \theta [z] \vert \theta [z] \in H)\). Thus, \(\kappa _\psi (x,y \vert U)\) is upper bounded by setting \(\epsilon = \vert x-y \vert / 2\) as

$$\begin{aligned} \kappa _\psi (x,y \vert U) \le P(R_{{\textit{max}}}(\psi ) > \vert x-y \vert / 2) \simeq 1- e^{-c \psi e^{- \psi \vert x-y \vert / 2}}. \end{aligned}$$

Accordingly, the following formula holds.

$$\begin{aligned} \vert (x-y)\kappa _\psi (x,y \vert U)\vert \lesssim \vert x-y \vert (1- e^{-c \psi e^{- \psi \vert x-y \vert / 2}}). \end{aligned}$$

By applying L’Hospital’s rule three times, the r.h.s. is known to converge to zero in the limit \(\vert x -y \vert \rightarrow \infty \). Thus, the Isolation Kernel also satisfies Eq. (C2).

From Proposition 2, the size of a Voronoi partition, which corresponds to h(n) upper bounded by \(2R_{{\textit{max}}}(\psi )\), converges to zero if \(n \rightarrow \infty \). Thus, the condition of Eq. (C3) holds.

Furthermore, let \(R_{{\textit{min}}}(\psi )\) be the minimum inscribed radii of a Voronoi partition \(\theta [z] \in H\) generated in a homogeneous Poisson point process of intensity \(\psi / \vert {\mathcal {X}}_U \vert \). Calka and Chenavier [28] showed that

$$\begin{aligned} P(R_{{\textit{min}}}(\psi ) \ge \epsilon ) = e^{-c_3 \psi ^2 \epsilon } \end{aligned}$$

holds in the limit \(\psi \rightarrow \infty \) for one-dimensional space, where \(c_3\) is a positive constant. This fact implies

$$\begin{aligned} P(n R_{{\textit{min}}}(\psi ) \ge \delta ) = e^{-c_3 \psi ^2 \delta /n} \end{aligned}$$

in the limit \(\psi \rightarrow \infty \). If we set \(\psi \propto n^\alpha \) (\(0<\alpha <1/2\)) and make \(n \rightarrow \infty \), \(n R_{{\textit{min}}}(\psi ) \ge \delta \) holds for any large \(\delta \) with probability one. Thus, \(n R_{{\textit{min}}}(\psi ) \rightarrow \infty \) holds with probability one under \(n \rightarrow \infty \). This fact implies Eq. (C4) because h(n) is lower bounded by \(2R_{{\textit{min}}}(\psi )\).

The above analysis shows that the consistency of IKR\(_V\) in one-dimensional data space is as good as the Gaussian kernel-based regression. The hypersphere-based IKR\(_S\) may also have the similar characteristic.

Current analyses on the behavior of the Nadaraya–Watson-type kernel regression for high-dimensional data space are limited in scope. For example, a recent analysis [38] derives an upper bound of the regression bias only. We conjecture that the consistency of kernel regression in high-dimensional data space is similar to the one-dimensional case.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ting, K.M., Washio, T., Wells, J. et al. Isolation Kernel Estimators. Knowl Inf Syst 65, 759–787 (2023). https://doi.org/10.1007/s10115-022-01765-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-022-01765-7

Keywords

Navigation