Skip to main content
Log in

On the Cost of Fixed Partial Match Queries in K-d Trees

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

Partial match queries constitute the most basic type of associative queries in multidimensional data structures such as \(K\)-d trees or quadtrees. Given a query \(\mathbf {q}=(q_0,\ldots ,q_{K-1})\) where s of the coordinates are specified and \(K-s\) are left unspecified (\(q_i=*\)), a partial match search returns the subset of data points \(\mathbf {x}=(x_0,\ldots ,x_{K-1})\) in the data structure that match the given query, that is, the data points such that \(x_i=q_i\) whenever \(q_i\not =*\). There exists a wealth of results about the cost of partial match searches in many different multidimensional data structures, but most of these results deal with random queries. Only recently a few papers have begun to investigate the cost of partial match queries with a fixed query \(\mathbf {q}\). This paper represents a new contribution in this direction, giving a detailed asymptotic estimate of the expected cost \(P_{{n},\mathbf {q}}\) for a given fixed query \(\mathbf {q}\). From previous results on the cost of partial matches with a fixed query and the ones presented here, a deeper understanding is emerging, uncovering the following functional shape for \(P_{{n},\mathbf {q}}\)

$$\begin{aligned} P_{{n},\mathbf {q}} = \nu \cdot \left( \prod _{i:q_i\text { is specified}}\, q_i(1-q_i)\right) ^{\alpha /2}\cdot n^\alpha + \text {l.o.t.} \end{aligned}$$

(l.o.t. lower order terms, throughout this work) in many multidimensional data structures, which differ only in the exponent \(\alpha \) and the constant \(\nu \), both dependent on s and K, and, for some data structures, on the whole pattern of specified and unspecified coordinates in \(\mathbf {q}\) as well. Although it is tempting to conjecture that this functional shape is “universal”, we have shown experimentally that it seems not to be true for a variant of \(K\)-d trees called squarish \(K\)-d trees.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. The algorithm to perform a partial match query is a partial match search; however, sometimes we will abuse the terminology and use the term partial match query when we should actually say partial match search.

  2. For brevity, we do not make the distinction between regular and extreme coordinates; \(\ell _0\), ..., \(\ell _{s-1}\) give the indices of specified coordinates.

  3. From now on, we shall omit the subscript \(\mathbf {u}\) of f to simplify notation.

References

  1. Bentley, J.L.: Multidimensional binary search trees used for associative retrieval. Commun. ACM 18(9), 509–517 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bentley, J.L., Finkel, R.A.: Quad trees: a data structure for retrieval on composite keys. Acta Inform. 4(1), 1–9 (1974)

    Article  MATH  Google Scholar 

  3. Broutin, N., Neininger, R., Sulzbach, H.: A limit process for partial match queries in random quadtrees and 2-d trees. Ann. Appl. Probab. 23(6), 2560–2603 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  4. Chanzy, P., Devroye, L., Zamora-Cura, C.: Analysis of range search for random \(k\)-d trees. Acta Inform. 37(4–5), 355–383 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  5. Chern, H.-H., Hwang, H.-K.: Partial match queries in random \(k\)-d trees. SIAM J. Comput. 35(6), 1440–1466 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  6. Curien, N., Joseph, A.: Partial match queries in two-dimensional quadtrees: a probabilistic approach. Adv. Appl. Probab. 43(1), 178–194 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  7. Cunto, W., Lau, G., Flajolet, Ph.: Analysis of \(k\)d-trees improved by local reorganisations. In: Dehne, F., Sack, J.-R., Santoro, N. (eds.), Workshop on Algorithms and Data Structures (WADS’89), Volume 382 of Lecture Notes in Computer Science, pp. 24–38. Springer (1989)

  8. Duch A., Estivill-Castro, V., Martínez, C.: Randomized \(k\) International Symposium on Algorithms and Computation (ISAAC), Volume 1533 of Lecture Notes in Computer Science, pp. 199–208. Springer (1998)

  9. Duch, A., Jiménez, R.M., Martínez, C.: Selection by rank in \(k\)-dimensional binary search trees. Random Struct. Algorithms 45(1), 14–37 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  10. Devroye, L., Jabbour, J., Zamora-Cura, C.: Squarish \(k\)-d trees. SIAM J. Comput. 30(5), 1678–1700 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  11. Duch, A., Lau, G., Martínez, C.: On the average performance of fixed partial match queries in random relaxed \(k\) International Meeting on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA). Discrete Mathematics & Theoretical Computer Science (Proceedings), pp. 103–114 (2014)

  12. Duch, A., Martínez, C.: On the average performance of orthogonal range search in multidimensional data structures. J. Algorithms 44(1), 226–245 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  13. Feller, W.: An Introduction to Probability Theory and Its Applications. Wiley, New York (1971)

    MATH  Google Scholar 

  14. Flajolet, Ph, Odlyzko, A.: Singularity analysis of generating functions. SIAM J. Discrete Math. 3(1), 216–240 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  15. Flajolet, Ph, Puech, C.: Partial match retrieval of multidimensional data. J. ACM 33(2), 371–407 (1986)

    Article  MathSciNet  Google Scholar 

  16. Flajolet, Ph, Sedgewick, R.: Analytic Combinatorics. Cambridge University Press, Cambridge (2009)

    Book  MATH  Google Scholar 

  17. Johnson, N.L., Kotz, S., Kemp, A.W.: Univariate Discrete Distributions, 2nd edn. Wiley, New York (1992)

    MATH  Google Scholar 

  18. Martínez, C., Panholzer, A., Prodinger, H.: Partial match queries in relaxed multidimensional search trees. Algorithmica 29(1–2), 181–204 (2001)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

We are very thankful to the two anonymous reviewers of this manuscript for their detailed reports and useful suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Conrado Martínez.

Additional information

This work has been partially supported by funds from the Spanish Ministry for Economy and Competitiveness (MINECO), the European Union (FEDER funds) under Grant COMMAS (Ref. TIN2013-46181-C2-1-R), and the Catalan Agency for Management of Research and University Grants (AGAUR) Grant SGR 2014:1034 (ALBCOM).

Appendices

Appendix 1: Getting the Integral Equation (8)

The hypothesis of Proposition 1 is that there exists some \(\gamma \) such that

$$\begin{aligned} \lim _{n\rightarrow \infty }n^{-\gamma }P_{{n},\mathbf {r}}=f(z_0,\ldots ,z_{t-1}) \end{aligned}$$

exists and is not identically null, with \(z_i = \lim _{n\rightarrow \infty } r_i/n\in (0,1)\), \(0\le i < t\). We shall also assume here that all \(r_i \le n/2\), for otherwise we can replace \(r_i\) by \(n-r_i\).

First of all, in the asymptotic regime, when \(r_i=o(n)\), the probability that we recursively continue the PM search in the right subtree is o(1), so we can assume all extremal ranks are \(r_i=0\) (\(t\le i < s\)), and thus we can rewrite the recurrence as

$$\begin{aligned}&P_{{n},\mathbf {r}} \sim 1 + \frac{1}{nK}\Biggl [\sum _{0\le i < t} \Bigl (\sum _{j=r_i}^{n-1} \sum _{\mathbf {r'}\in \mathcal {L}^{({i,j})}_\mathbf {r}}\pi _L^{({i,j})}(\mathbf {r},\mathbf {r'})P_{{j},\mathbf {r'}}\\&\qquad \qquad \quad +\sum _{j=0}^{r_i-1} \sum _{\mathbf {r'}\in \mathcal {R}^{({i,j})}_\mathbf {r}}\pi _R^{({i,j})}(\mathbf {r},\mathbf {r'})P_{{n-1-j},\mathbf {r'}}\Bigr ) \\&\qquad \qquad \quad +\sum _{t\le i < s}\sum _{j=0}^{n-1} \sum _{\mathbf {r'}\in \mathcal {L}^{({i,j})}_\mathbf {r}}\pi _L^{({i,j})}(\mathbf {r},\mathbf {r'})P_{{j},\mathbf {r'}}\\&\qquad \qquad \quad +\sum _{s\le i < K}\sum _{j=0}^{n-1}\sum _{\langle \mathbf {r'},\mathbf {r''}\rangle \in \mathcal {B}^{({i,j})}_\mathbf {r}} \pi _{B}^{({i,j})} (\mathbf {r},\mathbf {r'},\mathbf {r''})\left( P_{{j},\mathbf {r'}} +P_{{n-1-j},\mathbf {r''}}\right) \Biggr ]. \end{aligned}$$

A second simplification comes from the realization that \(\pi _L^{({i,j})}(\mathbf {r},\mathbf {r'})\), \(\pi _R^{({i,j})}(\mathbf {r},\mathbf {r'})\) are highly concentrated around the expected value of \(\mathbf {r'}\); in particular,

$$\begin{aligned} \sum _{\mathbf {r'}\in \mathcal {L}^{({i,j})}_\mathbf {r}}\pi _L^{({i,j})}(\mathbf {r},\mathbf {r'})P_{{j},\mathbf {r'}} \sim P_{{j},\mathbf {\overleftarrow{\mathbf {r}}}} \end{aligned}$$

where \(\overleftarrow{r_i}=r_i\) and \(\overleftarrow{r_k}= \frac{j}{n}r_k\) for \(k\not = i\).

Similarly,

$$\begin{aligned} \sum _{\mathbf {r'}\in \mathcal {R}^{({i,j})}_\mathbf {r}}\pi _R^{({i,j})}(\mathbf {r},\mathbf {r'})P_{{n-1-j},\mathbf {r'}} \sim P_{{n-1-j},\mathbf {\overrightarrow{\mathbf {r}}}}, \end{aligned}$$

where \(\overrightarrow{r_i}=r_i-j-1\) and \(\overrightarrow{r_k}= \frac{n-1-j}{n}r_k\) for \(k\not = i\). Last but not least,

$$\begin{aligned}&\sum _{\mathbf {r'}\in \mathcal {L}^{({i,j})}_\mathbf {r}} \sum _{\varvec{\delta }\in \{0,1\}^s} \prod _{0\le k < s}h(\delta _k) \frac{\left( {\begin{array}{c}j\\ r'_k\end{array}}\right) \left( {\begin{array}{c}n-1-j\\ r_k-\delta _k-r'_k\end{array}}\right) }{\left( {\begin{array}{c}n-1\\ r_k-\delta _k\end{array}}\right) } \left( P_{{j},\mathbf {r'}} +P_{{n-1-j},\mathbf {r-r'-\varvec{\delta }}}\right) \\&\qquad \sim P_{{j},\mathbf {\overleftrightarrow {\mathbf {r}}}} +P_{{n-1-j},\mathbf {r-\overleftrightarrow {\mathbf {r}}}}, \end{aligned}$$

where \(\overleftrightarrow {r_k}= \frac{j}{n}r_k\), \(0\le k < s\). Setting \(C_{{n},\mathbf {r}}:= 0\) if \(n=0\), and \(C_{{n},\mathbf {r}}:= n^{-\gamma }P_{{n},\mathbf {r}}\) if \(n > 0\),

$$\begin{aligned} C_{{n},\mathbf {r}}\sim & {} \frac{1}{n^\gamma }+\frac{1}{nK}\Biggl [\sum _{0\le i < t} \Biggl (\sum _{j=r_i}^{n-1} C_{{j},\mathbf {\overleftarrow{\mathbf {r}}}}\cdot \left( \frac{j}{n}\right) ^\gamma +\sum _{j=0}^{r_i-1} C_{{n-1-j},\mathbf {\overrightarrow{\mathbf {r}}}}\cdot \left( \frac{n-1-j}{n}\right) ^\gamma \Biggr )\Biggr ] \nonumber \\&+\frac{s_0}{nK}\cdot \sum _{j=0}^{n-1} C_{{j},\mathbf {\overleftarrow{\mathbf {r}}}} \cdot \left( \frac{j}{n}\right) ^\gamma \nonumber \\&+ \frac{(K-s)}{nK}\cdot \sum _{j=0}^{n-1}\left( C_{{j},\mathbf {\overleftrightarrow {\mathbf {r}}}} \cdot \left( \frac{j}{n}\right) ^\gamma +C_{{n-1-j},\mathbf {r-\overleftrightarrow {\mathbf {r}}}}\cdot \left( \frac{n-1-j}{n}\right) ^\gamma \right) . \end{aligned}$$
(13)

Notice that when \(t\le i < s\), we assume \(r_i=0\) and thus \(\overleftarrow{r_i}=0\) as well. Now, since \(t> 0\),

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{P_{{n},\mathbf {r}}}{n^\gamma }=f(z_0,\ldots ,z_{t-1}) \end{aligned}$$

exists and it is not identically null, by hypothesis. If we substitute \(C_{{n},\mathbf {r}}\) by \(f\left( \frac{r_0}{n},\ldots ,\frac{r_{t-1}}{n}\right) \) then

$$\begin{aligned}&f\left( \frac{r_0}{n},\ldots ,\frac{r_{t-1}}{n}\right) \sim \frac{1}{n^\gamma } \\&\quad +\,\frac{1}{nK}\Biggl [\sum _{0\le i < t} \Biggl (\sum _{j=r_i}^{n-1} f\left( \frac{r_0}{n},\ldots ,\frac{r_i}{j},\ldots ,\frac{r_{t-1}}{n}\right) \cdot \left( \frac{j}{n}\right) ^\gamma \\&\quad +\,\sum _{j=0}^{r_i-1} f\left( \frac{r_0}{n},\ldots ,\frac{r_i-j-1}{n-1-j},\ldots , \frac{r_{t-1}}{n}\right) \cdot \left( \frac{n-1-j}{n}\right) ^\gamma \Biggr )\Biggr ] \\&\quad +\,\frac{s_0}{nK}\cdot \sum _{j=0}^{n-1} f\left( \frac{r_0}{n},\ldots ,\frac{r_{t-1}}{n}\right) \cdot \left( \frac{j}{n}\right) ^\gamma \\&\quad +\,\frac{(K-s)}{nK}\cdot \sum _{j=0}^{n-1}\Biggl ( f\left( \frac{r_0}{n},\ldots ,\frac{r_{t-1}}{n}\right) \cdot \left( \frac{j}{n}\right) ^\gamma \\&\quad +\,f\left( \frac{r_0}{n},\ldots ,\frac{r_{t-1}}{n}\right) \cdot \left( \frac{n-1-j}{n}\right) ^\gamma \Biggr ). \end{aligned}$$

Passing to the limit when \(n\rightarrow \infty \), with \(z_i = \lim _{n\rightarrow \infty } (r_i/n)\), we replace sums by integrals and thus

$$\begin{aligned} f\left( z_0,\ldots ,z_{t-1}\right)&= \frac{1}{K}\Biggl [\sum _{0\le i < t} \Biggl ( \int _{z_i}^{1}f\left( z_0,\ldots ,\frac{z_i}{z},\ldots ,z_{t-1}\right) \cdot z^\gamma \,dz \\&\qquad +\int _{0}^{z_i} f\left( z_0,\ldots ,\frac{z_i-z}{1-z},\ldots ,z_{t-1}\right) \cdot \left( 1-z\right) ^\gamma \,dz\Biggr ) \\&\qquad +s_0\cdot \int _0^1 f\left( z_0,\ldots ,z_{t-1}\right) \cdot z^\gamma \,dz\\&\qquad + (K-s)\cdot \int _0^1 \Bigl (f\left( z_0,\ldots ,z_{t-1}\right) \cdot z^\gamma \\&\qquad +f\left( z_0,\ldots ,z_{t-1}\right) \cdot \left( 1-z\right) ^\gamma \Bigr )\,dz\Biggr ], \end{aligned}$$

which can be further manipulated to give

$$\begin{aligned} f\left( z_0,\ldots ,z_{t-1}\right)&= \frac{1}{K}\Biggl [\sum _{0\le i < t} \Biggl ( \int _{z_i}^{1}f\left( z_0,\ldots ,\frac{z_i}{z},\ldots ,z_{t-1}\right) \cdot z^\gamma \,dz \\&\quad +\int _{0}^{z_i} f\left( z_0,\ldots ,\frac{z_i-z}{1-z},\ldots ,z_{t-1}\right) \cdot \left( 1-z\right) ^\gamma \,dz\Biggr ) \\&\quad +s_0\cdot f\left( z_0,\ldots ,z_{t-1}\right) \cdot \frac{1}{\gamma +1} \\&\quad + (K-s)\cdot f\left( z_0,\ldots ,z_{t-1}\right) \cdot \frac{2}{\gamma +1}\Biggr ]. \end{aligned}$$

Hence

$$\begin{aligned} f(z_0,\ldots ,z_{t-1})&= \lambda \Biggl [\sum _{0\le i < t} \Bigl ( \int _{z_i}^{1}f\left( z_0,\ldots ,\frac{z_i}{z},\ldots ,z_{t-1}\right) \cdot z^\gamma \,dz \\&\quad +\int _{0}^{z_i} f\left( z_0,\ldots ,\frac{z_i-z}{1-z},\ldots ,z_{t-1}\right) \cdot \left( 1-z\right) ^\gamma \,dz\Bigr )\Biggr ], \end{aligned}$$

with

$$\begin{aligned} \lambda =\frac{1}{K}\frac{1}{1-\frac{2(K-s)+s_0}{K(\gamma +1)}}. \end{aligned}$$

Furthermore,

$$\begin{aligned} f(z_0,\ldots ,z_{t-1})&= \lambda \sum _{i=0}^{t-1} \Biggl \{z_i^{\gamma +1}\int _{z_i}^{1} f(z_0,\ldots ,z_{i-1},z,z_{i+1},\ldots ,z_{t-1})\frac{dz}{z^{\gamma +2}} \\&\qquad + (1-z_i)^{\gamma +1}\int _0^{z_i} f(z_0,\ldots ,z_{i-1},z,z_{i+1},\ldots ,z_{t-1})\frac{dz}{(1-z)^{\gamma +2}} \Biggr \}, \end{aligned}$$

with the substitution \(z:=z_i/z\) in the first integral and \(z:=(z_i-z)/(1-z)\) in the second.

Besides the integral equation (8), the properties of \(P_{{n},\mathbf {r}}\) translate into several constraints that \(f(z_0,\ldots ,z_{t-1})\) must satisfy:

  1. (a)

    The function f is symmetric with respect to any permutation of its arguments.

  2. (b)

    For any i, \(0\le i < t\), and \(z_i\in (0,1)\), \(f(z_0,\ldots ,z_{i-1},z_i,z_{i+1},\ldots ,z_{t-1}) = f(z_0,\ldots ,z_{i-1},1-z_i,z_{i+1},\ldots ,z_{t-1})\).

  3. (c)

    For any i, \(0\le i < t\),

    $$\begin{aligned}&\lim _{z_i\rightarrow 0} f(z_0,\ldots ,z_{i-1},z_i,z_{i+1},\ldots ,z_{t-1})\\&\qquad =\lim _{z_i\rightarrow 1} f(z_0,\ldots ,z_{i-1},z_i,z_{i+1},\ldots ,z_{t-1}) = 0. \end{aligned}$$
  4. (d)
    $$\begin{aligned} \int _0^1\cdots \int _0^1 f(y_0,\ldots ,y_{t-1})\,dy_0\cdots dy_{t-1} = \beta (\rho ,\rho _0). \end{aligned}$$

Constraint (d) follows because of (6). In fact, we must have \(\gamma =\alpha (\rho ,\rho _0)\), for otherwise we would have a contradiction with our hypothesis: either we have that \(\lim _{n\rightarrow \infty } P_{{n},\mathbf {r}}/n^{\gamma }=0\) or that limit does not exist (\(\rightarrow \infty \)). Also, because \(\gamma =\alpha (\rho ,\rho _0)=: \alpha \) we must have,

$$\begin{aligned} \lambda = \frac{\alpha +2}{2t}, \end{aligned}$$

since

$$\begin{aligned} 1-\frac{2(K-s)+s_0}{K(\gamma +1)} = \frac{2t}{K(\alpha +2)}. \end{aligned}$$

Constraint (c) follows from inductive reasoning. Suppose that for any rank vector \(\mathbf {r'}\) with \(s_0+1\) extreme values we have \(P_{{n},\mathbf {r'}}=\Theta \left( n^{\alpha (\rho ,\rho _0+1/K)}\right) \). Since setting \(z_i=0\) or \(z_i=1\) corresponds to one more extreme rank, dividing \(P_{{n},\mathbf {r}}\) by \(n^{-\alpha (\rho ,\rho _0)}\) yields that f is 0, because \(\alpha (\rho ,\rho _0+1/K) < \alpha (\rho ,\rho _0)\). To prove the basis of this induction, we must analyze the case when all the specified coordinates of a query are extreme. The recurrence for \(P_{{n},\mathbf {r}}\) in this case (\(s_0 = s\)) is greatly simplified. Indeed, for such queries we have

$$\begin{aligned} P_{{n},\mathbf {r}} = 1+\frac{s}{nK}\sum _{j=0}^{n-1}P_{{j},\mathbf {r}}+ \frac{K-s}{nK}\sum _{j=0}^{n-1}\left( P_{{j},\mathbf {r}}+P_{{n-1-j},\mathbf {r}}\right) \end{aligned}$$

as the query (actually, its rank vector) does not change as we proceed recursively with the PM search; moreover, whenever the discriminant at the root is one of the specified extreme coordinates we will systematically continue in the left subtree. The solution of the recurrence above is straightforward:

$$\begin{aligned} P_{{n},\mathbf {r}} = \Theta (n^{1-\rho _0}), \end{aligned}$$

that is, \(P_{{n},\mathbf {r}}=\Theta \left( n^{\alpha (\rho _0,\rho _0)}\right) \), as we wanted to show. It is also interesting to note that constraint (c) can also be proved as a consequence of the symmetries (a) and (b), and the symmetries of the weights of the recurrence lead to constraints (a) and (b).

Appendix 2: Solving the Integral Equation (8)

In order to solve the integral equation (8) given in Proposition 1, together with constraints (a)–(d) we transform it into an equivalent partial differential equation (PDE).

For any function \(f(z_0,z_1,\ldots ,z_{t-1})\) let

$$\begin{aligned} L_i[f]:= z_i^{\alpha +1}\int _{z_i}^{1} f\left( z_0,\ldots ,z_{i-1},z,z_{i+1},\ldots ,z_{t-1}\right) \,\frac{dz}{z^{\alpha +2}}, \end{aligned}$$

and, similarly let

$$\begin{aligned} R_i[f]:= (1-z_i)^{\alpha +1}\int _{0}^{z_i} f\left( z_0,\ldots ,z_{i-1},z,z_{i+1},\ldots ,z_{t-1}\right) \,\frac{dz}{(1-z)^{\alpha +2}}. \end{aligned}$$

If we set \(T:= \lambda \sum _{i=0}^{t-1}(L_i + R_i)\) where \(\lambda = \frac{\alpha +2}{2t}\) then the function f we are looking for is a non-trivial solution to the fix-point equation \(f = T[f]\) with the constraints (a)–(d).

Let us now assume that the solution to the integral equation is a function in separable variables, namely \(f(z_0,z_1,\ldots ,z_{t-1}) = \phi _0(z_0)\cdot \phi _1(z_1)\cdots \phi _{t-1}(z_{t-1})\). Because of the symmetry of f (constraint (a)), it follows that we can safely assume \(\phi _0 = \phi _1 = \cdots = \phi _{t-1} =: \phi \). Furthermore, because of constraint (b), we must have \(\phi (z) = \phi (1-z)\) for any \(z\in (0,1)\). We must also have \(\lim _{z\rightarrow 0} \phi (z) = 0\) to satisfy constraint (c).

Going back to the integral equation, if we denote \(\phi _i:= \phi (z_i)\) we must have

$$\begin{aligned} \phi _0\cdot \phi _1\cdots \phi _{t-1} = \lambda \sum _{i=0}^{t-1}\phi _0\cdots \phi _{i-1}\cdot \phi _{i+1}\cdots \phi _{t-1}\Bigl (L_i[\phi _i]+R_i[\phi _i]\Bigr ). \end{aligned}$$

If, for all i, \(0\le i < t\),

$$\begin{aligned} \phi _i = t\lambda \Bigl (L_i[\phi _i]+R_i[\phi _i]\Bigr ), \end{aligned}$$
(14)

then

$$\begin{aligned} \lambda \sum _{i=0}^{t-1} \phi _0\cdots \phi _{i-1}\cdot \phi _{i+1} \cdots \phi _{t-1}\cdot \Bigl (L_i[\phi _i]+R_i[\phi _i]\Bigr ) = \lambda \sum _{i=0}^{t-1}\frac{\phi _0\cdots \phi _{t-1}}{t\lambda } = \phi _0\cdots \phi _{t-1}. \end{aligned}$$

The solution of (14), namely, the solution of

$$\begin{aligned} \phi (z) = t\lambda \left( z^{\alpha +1}\int _{z}^{1} \phi (u)\frac{du}{u^{\alpha +2}}+(1-z)^{\alpha +1}\int _0^{z}\phi (u) \frac{du}{(1-u)^{\alpha +2}}\right) \end{aligned}$$

can be obtained by solving the equivalent ordinary differential equation that we obtain applying the operator

$$\begin{aligned} \Phi _i[g(z_i)]:= z_i(1-z_i)\frac{d^2g}{dz_i^2} + \alpha (2z_i-1)\frac{dg}{dz_i}-\alpha (\alpha +1)g(z_i), \end{aligned}$$

to both sides. The linear operator \(\Phi \) allows us to remove the integrals in \(L_i\) and \(R_i\):

$$\begin{aligned} \Phi _i[L_i[g]]&= (z_i-1)\frac{dg}{dz_i}-\alpha g\\ \Phi _i[R_i[g]]&= z_i\frac{dg}{dz_i}-\alpha g\\ \Phi _i[(L_i+R_i)[g]]&= (2z_i-1)\frac{dg}{dz_i}-2\alpha g. \end{aligned}$$

In particular, we obtain the following ODE for \(\phi (z)\), after rearranging:

$$\begin{aligned} z(1-z)\phi ''(z)+\alpha (2z-1)\phi '(z)-\alpha (\alpha +1)\phi (z) = t\lambda \Bigl ((2z-1)\phi '(z)-2\alpha \phi (z)\Bigr ), \end{aligned}$$

or more conveniently,

$$\begin{aligned} z(1-z)\phi ''(z)+(\alpha -t\lambda )(2z-1)\phi '(z) -\alpha (\alpha +1-2t\lambda )\phi (z) = 0, \end{aligned}$$

with the initial condition \(\phi (0)=0\). Again we have a second order linear hypergeometric ODE, and without too much effort, as in [9], we can obtain the solution \(\phi (z) = \mu \left( z(1-z)\right) ^{\alpha /2}\), for some constant \(\mu \) and \(\alpha =\alpha (\rho ,\rho _0)\). We have thus

$$\begin{aligned} f(z_0,\ldots ,z_{t-1}) = \nu _{s,t, K} \left( \prod _{i=0}^{t-1}z_i(1-z_i)\right) ^{\alpha /2}, \end{aligned}$$

with \(\nu _{s,t, K}:= \mu ^{t}\). This family of solutions (parameterized by the “arbitrary” \(\nu _{s,t, K}\)) obviously satisfies constraints (a), (b) and (c). Constraint (d) yields the sought function, as we impose

$$\begin{aligned} \nu _{s,t, K}\frac{\Gamma ^{2t}(\alpha /2+1)}{\Gamma ^t(\alpha +2)}= \beta (\rho ,\rho _0). \end{aligned}$$

Appendix 3: Bounding the Errors

Once we have an explicit form for \(f(\mathbf {z}):= f(z_0,\ldots ,z_{t-1})\), we can compute error bounds for the successive approximations that led us from the recurrence in (7) to the integral equation (8). Our knowledge of the function \(f(z_0,\ldots ,z_{t-1})\) and its derivatives in (0, 1) is the key to find these bounds. First, we can use the trapezoid rule or the Euler-Maclaurin summation formula to bound the error in passing from sums to integrals; for instance

$$\begin{aligned}&\frac{1}{n}\sum _{j=r_i}^{n-1}f\left( \frac{r_0}{n},\ldots , \frac{r_i}{j},\ldots ,\frac{r_{t-1}}{n}\right) \left( \frac{j}{n}\right) ^\gamma \\&\quad = \frac{1}{n}\int _{r_i}^{n}f\left( \frac{r_0}{n},\ldots , \frac{r_i}{u},\ldots ,\frac{r_{t-1}}{n}\right) \left( \frac{u}{n}\right) ^\gamma \,du \\&\qquad -\frac{1}{2n} f\left( \frac{r_0}{n},\ldots , \frac{r_i}{n},\ldots ,\frac{r_{t-1}}{n}\right) +O(n^{-2}), \end{aligned}$$

and similarly for the other integrals.

Now, if we compare recurrence (7) for \(P_{{n},\mathbf {r}}\) to the recurrence (13) for \(C_{{n},\mathbf {r}}\), apart from the normalizing factor \(n^\gamma \), the difference comes from the splitting probabilities \(\pi _L^{({i,j})}(\mathbf {r},\mathbf {r'})\), \(\pi _R^{({i,j})}(\mathbf {r},\mathbf {r'})\) and \(\pi _{B}^{({i,j})} (\mathbf {r},\mathbf {r'},\mathbf {r''})\), which we argued are highly concentrated around their respective means. Here, Laplace’s method for summations can be used to bound the error in that step. For instance, take

$$\begin{aligned}&\sum _{\mathbf {r}'\in \mathcal {L}^{({i,j})}_\mathbf {r}}\pi _L^{({i,j})}(\mathbf {r},\mathbf {r'}) f\left( \frac{r'_0}{j},\ldots , \frac{r_i}{j},\ldots ,\frac{r'_{t-1}}{j}\right) \\&\quad = \nu _{s,t, K}\sum _{\mathbf {r}'\in \mathcal {L}^{({i,j})}_\mathbf {r}} \phi \left( \frac{r_i}{j}\right) \cdot \left( \prod _{\begin{array}{c} 0\le k < t\\ k\not =i \end{array}}\frac{\left( {\begin{array}{c}j\\ r'_k\end{array}}\right) \left( {\begin{array}{c}n-j\\ r_k-r'_k\end{array}}\right) }{\left( {\begin{array}{c}n\\ r_k\end{array}}\right) } \phi \left( \frac{r'_k}{j}\right) \right) , \end{aligned}$$

for some j such that \(j/n\rightarrow c\), for some constant \(0 < c < 1\). Now, the right-hand side above can be re-written as

$$\begin{aligned} \phi \left( \frac{r_i}{j}\right) \prod _{\begin{array}{c} 0\le k < t\\ k\not =i \end{array}} \sum _{0\le r'_k\le r_k} \frac{\left( {\begin{array}{c}j\\ r'_k\end{array}}\right) \left( {\begin{array}{c}n-j\\ r_k-r'_k\end{array}}\right) }{\left( {\begin{array}{c}n\\ r_k\end{array}}\right) } \phi \left( \frac{r'_k}{j}\right) \end{aligned}$$

and we can deal with each factor separately (here, the fact that \(f(z_0,\ldots ,z_{t-1})=\phi (z_0)\cdots \phi (z_{t-1})\) greatly simplifies the proof). With our assumption that \(r_k/n\rightarrow z_k\) for some \(0 < z_k < 1\), we need just to show that

$$\begin{aligned} \sum _{0\le r'_k\le r_k} \frac{\left( {\begin{array}{c}j\\ r'_k\end{array}}\right) \left( {\begin{array}{c}n-j\\ r_k-r'_k\end{array}}\right) }{\left( {\begin{array}{c}n\\ r_k\end{array}}\right) } \phi \left( \frac{r'_k}{j}\right) \sim \phi \left( \frac{r_k}{n}\right) \end{aligned}$$

The splitting probabilities are given by products of the hypergeometric distribution (owing to the independence with which coordinates of each data point are drawn)

$$\begin{aligned} \pi _L^{({i,j})}(k)= \frac{\left( {\begin{array}{c}j\\ r'_k\end{array}}\right) \left( {\begin{array}{c}n-j\\ r_k-r'_k\end{array}}\right) }{\left( {\begin{array}{c}n\\ r_k\end{array}}\right) } \end{aligned}$$

and then we can apply the following approximation to the binomial distribution [17] as long as \(r_k=z_k n + o(n)\) and \(j=cn + o(n)\)

$$\begin{aligned} \frac{\left( {\begin{array}{c}j\\ r'_k\end{array}}\right) \left( {\begin{array}{c}n-j\\ r_k-r'_k\end{array}}\right) }{\left( {\begin{array}{c}n\\ r_k\end{array}}\right) }= \left( {\begin{array}{c}r_k\\ r'_k\end{array}}\right) \left( \frac{j}{n}\right) ^{r'_k} \left( 1-\frac{j}{n}\right) ^{r_k-r'_k} \left( 1+\frac{r'_k-(r'_k-\overline{r}_k)^2}{2j}+ O\left( \frac{1}{j^2}\right) \right) , \end{aligned}$$

where \(\overline{r}_k=r_k\frac{j}{n}\) is the mean value of the hypergeometric distribution.

If we divide the range of summation of \(r'_k\) into three parts, from 0 to \(\overline{r}_k-\Delta -1\), from \(\overline{r}_k-\Delta \) to \(\overline{r}_k+\Delta \) and from \(\overline{r}_k+\Delta +1\) to \(r_k\), we can consider the three parts separately, with the main contribution coming from the middle range. In particular, we need \(\Delta ^3/n^2\rightarrow 0\) as \(n\rightarrow \infty \), that is \(\Delta =o(n^{2/3})\), to be able to apply the de Moivre–Laplace limit theorem to the middle sum. With \(\sigma =r_k\frac{j}{n}\left( 1-\frac{j}{n}\right) \), we have that the middle sum is

$$\begin{aligned}&\sum _{r'_k=\overline{r}_k-\Delta }^{\overline{r}_k+\Delta } \frac{1}{\sqrt{2\pi \sigma }} e^{-(r'_k-\overline{r}_k)^2/(2\sigma )+O(\Delta ^3/n^2)+O(\Delta /n)}\\&\qquad \times \,\phi \left( \frac{r'_k}{j}\right) \left( 1+O(\Delta /n)+O(\Delta ^2/n)+ O\left( \frac{1}{n^2}\right) \right) , \end{aligned}$$

where we have also expressed the error bounds for the approximation of the hypergeometric distribution in terms of \(\Delta \); we need \(\Delta =o(\sqrt{n})\) too for the approximation to be of any use. Using \(e^x=1+O(x)\) we can write the sum above as

$$\begin{aligned} \sum _{r'_k=\overline{r}_k-\Delta }^{\overline{r}_k+\Delta } \frac{1}{\sqrt{2\pi \sigma }} e^{-(r'_k-\overline{r}_k)^2/(2\sigma )} (1+O(\Delta /n)) \phi \left( \frac{r'_k}{j}\right) \left( 1+O(\Delta ^2/n)\right) , \end{aligned}$$

since \(O(\Delta ^3/n^2)=O(\Delta /n)\) for \(\Delta =o(\sqrt{n})\). Finally, we can expand \(\phi (r'_k/j)=\phi (\overline{r}_k/j + y/j)\) for \(y= r'_k-\overline{r}_k\), \(y\in [-\Delta ,\Delta ]\) as \(\phi (r'_k/j)=\phi (\overline{r}_k/j)+O(\Delta /n)\) to get

$$\begin{aligned}&\sum _{r'_k=\overline{r}_k-\Delta }^{\overline{r}_k+\Delta } \frac{1}{\sqrt{2\pi \sigma }} e^{-(r'_k-\overline{r}_k)^2/(2\sigma )} (1+O(\Delta /n)) \phi \left( \frac{\overline{r}_k}{j}\right) \left( 1+O(\Delta /n)\right) \left( 1+O(\Delta ^2/n)\right) \\&\quad = \phi \left( \frac{\overline{r}_k}{j}\right) (1+O(\Delta ^2/n))= \phi \left( \frac{\overline{r}_k}{j}\right) (1+o(1)). \end{aligned}$$

To complete this part of the analysis we only need to show that the other two sums (with \(r'_k<\overline{r}_k-\Delta \) and \(r'_k > \overline{r}_k+\Delta \)) are negligible as \(n\rightarrow \infty \). This immediately follows since \(\phi (r'_k/j)\) is bounded by a constant, and we only need to note that the tails of the hypergeometric distribution (or its binomial approximation) decay polynomially as we move away from the mean \(\overline{r}_k\), then exponentially. To have an error bound as small as possible it helps to take \(\Delta \) as large as possible, as long as it remains \(o(\sqrt{n})\).

We handle the other inner sums (for rank vectors in \(\mathcal {L}^{({i,j})}_\mathbf {r}\), \(\mathcal {R}^{({i,j})}_\mathbf {r}\) and \(\mathcal {B}^{({i,j})}_\mathbf {r}\)) in (7) analogously; we have thus that the error bound inside each summation on j is \((1+O(\Delta ^2/n))\), but the approximations are not valid if \(j=o(n)\). However these can be disregarded as their total contribution is negligible, since the tail of the hypergeometric distribution decays exponentially. This also justifies the assumption that all extreme ranks are \(r_i=0\) when we actually have extreme ranks \(r_i=o(n)\) (or \(r_i=n-o(n)\), but then we can take \(r_i:= n-r_i\) because of the symmetry).

Altogether, these computations show that \(C_{{n},\mathbf {r}}'=f(\mathbf {r}/n)+o(1)\) satisfies

$$\begin{aligned}&C_{{n},\mathbf {r}}' \sim o(1) +\frac{1}{nK}\Biggl [\sum _{0\le i < t} \Biggl (\sum _{j=r_i}^{n-1} C_{{j},\mathbf {\overleftarrow{\mathbf {r}}}}'\cdot \left( \frac{j}{n}\right) ^\gamma +\sum _{j=0}^{r_i-1} C_{{n-1-j},\mathbf {\overrightarrow{\mathbf {r}}}}'\cdot \left( \frac{n-1-j}{n}\right) ^\gamma \Biggr )\Biggr ] \nonumber \\&\qquad +\,\frac{s_0}{nK}\cdot \sum _{j=0}^{n-1} C_{{j},\mathbf {\overleftarrow{\mathbf {r}}}}' \cdot \left( \frac{j}{n}\right) ^\gamma \nonumber \\&\qquad +\,\frac{(K-s)}{nK}\cdot \sum _{j=0}^{n-1}\left( C_{{j},\mathbf {\overleftrightarrow {\mathbf {r}}}}' \cdot \left( \frac{j}{n}\right) ^\gamma +C_{{n-1-j},\mathbf {r-\overleftrightarrow {\mathbf {r}}}}'\cdot \left( \frac{n-1-j}{n}\right) ^\gamma \right) , \end{aligned}$$
(15)

and hence \(f(\mathbf {r}/n)\cdot n^\alpha +o(n^\alpha )\) satisfies the full recurrence (7), with toll function \(o(n^\alpha )\) instead of the toll function \(\tau _{n,\mathbf {r}}=1\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Duch, A., Lau, G. & Martínez, C. On the Cost of Fixed Partial Match Queries in K-d Trees. Algorithmica 75, 684–723 (2016). https://doi.org/10.1007/s00453-015-0097-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-015-0097-4

Keywords

Navigation