Abstract
We first review existing spaceefficient data structures for the orthogonal range search problem. Then, we propose two improved data structures, the first of which has better query time complexity than the existing structures and the second of which has better space complexity that matches the informationtheoretic lower bound.
You have full access to this open access chapter, Download chapter PDF
Similar content being viewed by others
1 Introduction
Consider a set P of n points in the ddimensional space \(\mathbb {R}^d\). Given an orthogonal range \(Q = \left[ l^{(Q)}_{0}, u^{(Q)}_{0}\right] \times \left[ l^{(Q)}_{1}, u^{(Q)}_{1}\right] \times \cdots \times \left[ l^{(Q)}_{d1}, u^{(Q)}_{d1}\right] \), the problem of answering queries for information on \(P \cap Q\), the subset of P contained in the range Q, is called the orthogonal range search problem, and is one of the fundamental problems in computational geometry.
The information obtained about \(P \cap Q\) differs depending on the query. The most basic queries are the reporting query, which enumerates all the points in \(P \cap Q\), and the counting query, which returns the number of points \(\left P \cap Q \right \). There are other queries such as the emptiness query, which checks whether \(P \cap Q\) is empty or not, and aggregate queries, which compute the summation, average, or variance of weights of points in the query range.
Applications of the orthogonal range search problem include database searchesÂ [21]. For example, assuming there is a database of employees of a company, then a query to count the number of employees whose duration of service is at least \(x_1\) years and at most \(x_2\) years, age is at least \(y_1\) and at most \(y_2\), and annual income is at least \(z_1\) and at most \(z_2\), can be formalized as an orthogonal range search problem. Other applications include geographical information systems, CAD, and computer graphics.
In such applications, it is common to perform multiple queries on the same point set P. We therefore consider constructing the problem as an indexing problem: Given a point set P a priori, we first construct some data structure D from P. Then, when a query range Q is given, we answer the query using the data structure D.
1.1 Existing Work
In many existing works, the number n of points is regarded as a variable for evaluating time complexity and the number d of dimensions is regarded as a constant. However, in this chapter, we regard d as a variable too. For the computation model, we use wbit word RAM where \(w = \mathrm {\Theta }\left(\lg n\right)\) bits. That is, a constant number of coordinate values can be treated in constant time. Then, it takes \(\mathrm {O}\mathord {\left(d\right)}\) time to check whether a point is inside a query range.
If more space than \(\mathrm {\Theta }\left(dn\right)\) words is allowed to be used for the space complexity of data structures and if we assume that d is a constant, then we can perform the counting and reporting queries in time polynomial to \(\log n\). Range treesÂ [2, 14, 15, 23] are such data structures. Range trees support counting queries in \(\mathrm {O}\mathord {\left(d \log ^{d1}n\right)}\) time and reporting queries in \(\mathrm {O}\mathord {\left(d \log ^{d1}n + dk\right)}\) time using \(\mathrm {O}\mathord {\left(dn \log ^{d1}n\right)}\) word space, where \(k = \left P \cap Q \right \), that is, the number of points enumerated by a reporting query using the fractional cascading techniqueÂ [15, 23]. Although these data structures are timeefficient, it is desirable to develop more spaceefficient data structures.
Some data structures having linear space complexity have been proposed. For example, quad treesÂ [6] were the first data structures used for orthogonal range search. Unfortunately, quad trees have terrible worstcase behaviors. To overcome this, kdtreeÂ [1] is used. The query time complexity of the kdtree is \(\mathrm {O}\mathord {\left(d^2 n^{\frac{d1}{d}}\right)}\) for counting and \(\mathrm {O}\mathord {\left(d^2 n^{\frac{d1}{d}} + dk\right)}\) for reportingÂ [13].
These data structures store the coordinates of points separately in plain form, and therefore can be applied to the case of realvalued coordinates. However, if the coordinates take integer values from 0 to \(n1\), then there exist data structures with even smaller space complexity and query time complexity. For example, ChazelleÂ [4] proposed a data structure for the twodimensional case with linear space complexity and time complexity of \(\mathrm {O}\mathord {\left(\lg n\right)}\) for counting and \(\mathrm {O}\mathord {\left(\lg n + k \lg ^{\varepsilon }n\right)}\) for reporting where \(0< \varepsilon <1\) is any constant. Note that although the assumption that each coordinate value is an integer from 0 to \(n1\) seems too strict, as is explained in Sect.Â 8.2.2, any orthogonal range search problem in ddimensional space can be reduced into one on the \([n]^d\) grid, and therefore the assumption does not create any difficulties.
There has also been research on succinct data structures for the orthogonal range search problem. The wavelet treeÂ [9] is a data structure which was originally proposed for representing compressed suffix arrays, and it later turned out that wavelet tree can support various queries efficientlyÂ [18]. For the orthogonal range search problem, wavelet tree can support counting queries in \(\mathrm {O}\mathord {\left(\lg n\right)}\) time and reporting queries in \(\mathrm {O}\mathord {\left((1+k)\lg n\right)}\) timeÂ [8]. Bose et al.Â [3] proposed improved succinct data structures that support counting queries in \(\mathrm {O}\mathord {\left(\lg n/\lg \lg n\right)}\) time and reporting queries in \(\mathrm {O}\mathord {\left(((1+k)\lg n/\lg \lg n\right)}\) for twodimensional cases.
For higher dimensions, Okajima and MaruyamaÂ [20] proposed the KDWtree, which is a succinct data structure for any dimensionality. The query time complexity of the KDWtree is smaller than that of the kdtree. If we assume d is a constant, counting queries take \(\mathrm {O}\mathord {\left(n^{\frac{d2}{d}}\lg n\right)}\) time and reporting queries take \(\mathrm {O}\mathord {\left(\left( n^{\frac{d2}{d}}+k\right) \lg n\right)}\) time. The KDWtree has been shown to be practical by numerical experiments.
1.2 Our Results
We show space and time complexities of data structures for the orthogonal range search problem explained in Sect.Â 8.1.1 and our proposed data structures in TableÂ 8.1. Note that these are for the case where the coordinates are integers from 0 to \(n1\), and the space complexities are measured in bits. TableÂ 8.1 shows reporting time complexities. Counting time complexities can be obtained by letting \(k=0\).
Our data structures are spaceefficient for highdimensional orthogonal range search problems.
Our first data structure has the same space complexity as the KDWtree and better query time complexities. Note that the result in TableÂ 8.1 is for the case of \(d \ge 3\). If \(d = 2\), we can improve the \(n^\frac{d2}{d}\) term to \(\lg n\). This result appeared inÂ [11].
Note that, as shown in Sect.Â 8.2.1, the necessary space to represent a set of n points in ddimensional space such that each coordinate takes an integer value from 0 to \(n1\) is \((d1) n \lg n + \mathrm {\Theta }\left(n\right)\) bits. This means that if we assume d is a constant, the space complexity of the KDWtree and our first data structure does not match the informationtheoretic lower bound asymptotically.
Our second data structure uses \((d1) n \lg n + (d1)\cdot \mathrm {o}\mathord {\left(n \lg n\right)}\) bits of space. This asymptotically matches the informationtheoretic lower bound even if d is assumed to be a constant. Therefore, we can say this data structure is truly succinct. Unfortunately, the worstcase query time complexity is \(\mathrm {O}\mathord {\left(dn \lg n\right)}\), which is not fast in theory. However, this data structure is fast in practice for the case where the number d of dimensions is large but the number \(d'\) of dimensions used for a query is small. This kind of query often occurs in the database search applications shown in Sect.Â 8.1. This result appeared inÂ [10].
2 Preliminaries
In this paper, we assume that coordinates of points are nonnegative integers. As will be explained in Sect. 8.2.2, we sometimes assume that coordinates are integers from 0 to \(n1\). Therefore, we define [n] as the set \(\{0, 1, \ldots , n1\}\). For a ddimensional space, we denote each dimension by dim. 0, dim. 1, . . . , dim. \(d  1\), coordinate values of a point by 0th coordinate value, 1th coordinate value, . . . , \(d1\)th coordinate value. For a rooted tree, we assume the depth of the root node is 0. Throughout the paper, log x denotes the natural logarithm and lg x denotes the base 2 logarithm.
Next, we define two concepts used in this chapter. The first one is containment degree. This is the concept of an inclusion relationship between two orthogonal ranges introduced inÂ [20]. For two ddimensional orthogonal ranges \(Q = \left[ l^{(Q)}_{0}, u^{(Q)}_{0}\right] \times \ldots \times \left[ l^{(Q)}_{d1}, u^{(Q)}_{d1}\right] \) and \(R {=} \left[ l^{(R)}_{0}, u^{(R)}_{0}\right] \times \cdots \times \left[ l^{(R)}_{d1}, u^{(R)}_{d1}\right] \), we define \(\mathrm {CDeg}(R,Q)\) as
and call it the containment degree of R with respect to Q. This is the number of dimensions, in each of which R is contained in Q. The containment degree is an important concept for analyzing time complexities of orthogonal range search algorithms.
Next, we explain \(z\)value. This is a projection of multidimensional data onto onedimensional data as proposed by MortonÂ [17]. Consider a point \(p = (p_0, p_1, \ldots , p_{d1})\) in the ddimensional space where the coordinate values are integers. If coordinate values are expressed as lbit binary numbers \(p_0 = b_0^0b_0^1\cdots b_0^{l1},\) \(p_1=b_1^0b_1^1\cdots b_1^{l1}, \ldots , p_{d1} = b_{d1}^0b_{d1}^1\cdots b_{d1}^{l1}\), the \(z\)value z(p) of point p is defined as
In the case of a twodimensional space, if we arrange grid points in increasing order of \(z\)value, we see a zshape curve as shown in Fig.Â 8.1. We therefore call the value \(z\)value.
2.1 Succinct Data Structures and InformationTheoretic Lower Bound
Succinctness of data structures was proposed by JacobsonÂ [12] and is one of the criteria for measuring space complexities of data structures. It is defined as follows.
Let n be the number of different values that an object can take. Then, we need at least \(\left\lceil \lg n \right\rceil \) bits of space to represent the object. If the space complexity S(n) of a data structure representing the object satisfies \(S(n) = \lg n + \mathrm {o}\mathord {\left(\lg n\right)}\) bits, we say the data structure is succinct and \(\left\lceil \lg n \right\rceil \) bits is the informationtheoretic lower bound of the size of representations of the object. Note that succinct data structures not only offer data compression, but also support some efficient queries. For orthogonal range search, a naive algorithm supports linear time queries by scanning an array containing coordinate values of points. Succinct data structures are therefore expected to answer queries in sublinear time.
The space complexity of \(\lg n + \mathrm {o}\mathord {\left(\lg n\right)}\) bits in the definition of succinct data structures indicates that the size of auxiliary indexing data structures added to the data is negligibly small compared with the size of the data itself (\(\lg n\) bits). In other words, the space complexity of succinct data structures asymptotically matches the informationtheoretic lower bound when \(n \rightarrow \infty \).
We compute the informationtheoretic lower bound for representing a set of points with integer coordinates. Assume that ith coordinate value takes integer values from 0 to \(U_i1\). Because the number of grid points is \(\prod _{i=0}^{d1} U_i\), the number of different sets of n points is
By using Stirlingâ€™s approximation formula
we obtain
Therefore, the informationtheoretic lower bound of the size for representing the point set is
Note that storing coordinate values of the points explicitly using \(\sum _{i=0}^{d1} \left\lceil \lg U_i \right\rceil \) use \(n \lg n\) bit more space than the informationtheoretic lower bound.
2.2 Assumptions on Point Sets
Because data structures such as kdtree or range trees that have linear or larger space complexities usually store the coordinates of points in a plain format, we do not care whether they are integers or real values. However, if we consider succinct data structures, we usually assume that coordinates values are integers from 0 to \(n1\). We also assume that for any points \(p,q \in P\) and any \(i \in [d]\), the ith coordinate value \(p_i\) of p and the ith coordinate value \(q_i\) of q are different. Although this assumption may appear to be unrealistic and too strong, for the orthogonal range search problem, it is known that an arbitrary point set on \(\mathbb {R}^d\) can be transformed into a point set on \([n]^d\)Â [7].
Consider a set P of n points on \(\mathbb {R}^d\). We create another point set \(P'\) on \([n]^d\) as follows. The set \(P'\) also contains n points and there is a onetoone correspondence between points in P and points in \(P'\). Assume that \(p \in P\) corresponds to \(p' \in P'\). Then, the ith coordinate value \(p'_i\) of \(p'\) is then defined from the ith coordinate value \(p_i\) of p as
That is, the ith coordinate value of \(p'\) is the number of points in P such that the ith coordinate value is smaller than \(p_i\). This is called the rank value of p with respect to the ith coordinate value, and the transformation is called the transformation into rank space. We use arrays \(C_0, C_1, \ldots ,C_{d1}\) each of length n. The array \(C_i\) stores the ith coordinate values of points in P in increasing order.
By using the point set \(P'\) on the rank space and the arrays \(C_i\) (\(i = 0, \ldots ,d1\)) that contain the original coordinate values of the points in P, we can reduce the problem of orthogonal range search on the original point set P into that on \(P'\). Assume that a query range \(Q = \left[ l^{(Q)}_{0}, u^{(Q)}_{0}\right] \times \cdots \times \left[ l^{(Q)}_{d1}, u^{(Q)}_{d1}\right] \subset \mathbb {R}^d\) is given for a point set P. From the construction of \(P'\), there exists a range \(Q' = \left[ l^{(Q')}_{0}, u^{(Q')}_{0}\right] \times \cdots \times \left[ l^{(Q')}_{d1}, u^{(Q')}_{d1}\right] \subset [n]^d\) such that
The boundaries of this \(Q'\) are computed by
These are computed in \(\mathrm {O}\mathord {\left(d\lg n\right)}\) time by binary searches on the arrays \(C_i\). Then, the counting query is performed by using \(Q'\). For the reporting query, after finding a point \(p' \in P'\) which is included in the query range \(Q'\) in the rank space, we need to recover the original coordinates of the point \(p \in P\). This is done in \(\mathrm {O}\mathord {\left(d\right)}\) time using the arrays \(C_i\) containing the coordinates of the original points by
Thus, an orthogonal range search problem on \(\mathbb {R}^d\) can be transformed into that on \([n]^d\). Note that if coordinates are transformed as in Eq.Â (8.1), the identical coordinate values in \(\mathbb {R}^d\) are transformed into identical coordinate values in \([n]^d\). By shifting values by one for the identical coordinate values, we can transform the coordinate values so that for any two distinct points \(p',q' \in P'\) and any \(i \in [d]\), the ith coordinate value \(p'_i\) of \(p'\) is different from the ith coordinate value \(q'_i\) of \(q'\).
If the original points have integer coordinate values, we can reduce the spaceÂ [19]. Consider the case where P is a point set on \([U]^d\), that is, each coordinate value takes an integer value from 0 to \(U1\). In this case, the point set \(P'\) in the rank space does not change. However, we store the coordinates of the original point set P in a different way. We store them using multisets \(M_0, M_1, \ldots , M_{d1}\), each of which corresponds to one of the d dimensions. The multiset \(M_i\) stores the ith coordinate value of the points in P. We use the data structure ofÂ [22] to store multisets.
Lemma 8.1
There exists a data structure using \(n \lg (U/n) + \mathrm {O}\mathord {\left(n\right)}\) which supports a selectm query on a multiset \(M_i\) in constant time.
A selectm query on a multiset M finds the jth smallest element in M. That is, \(C_i[j]\) is obtained by finding the jth smallest element in array \(C_i\). Therefore, if a query range Q on \([U]^d\) is given, it can be transformed into a query range \(Q'\) on the rank space by binary searches using selectm queries, and the original coordinate values are obtained by d many selectm queries.
Assume that there exists a succinct data structure \(D'\) for a point set \(P'\) on \([n]^d\). Then, the space complexity of \(D'\) is \((d1)n \lg n + (d1) \cdot \mathrm {o}\mathord {\left(n \lg n\right)}\) bits, as shown in Sect.Â 8.2.1. If we add d data structures of LemmaÂ 8.1, the total space complexity becomes \(dn \lg U  n \lg n + (d1) \cdot \mathrm {o}\mathord {\left(n \lg n\right)}\) bits. This is succinct for the point set P on \([U]^d\). Therefore, if there exists a succinct data structure for a point set on \([n]^d\), we can construct a succinct data structure for a point set on \([U]^d\). From here onward, we consider only point sets on \([n]^d\).
3 kdTree
kdtreeÂ [1] is a wellknown data structure that partitions the space recursively. It is used not only for the orthogonal range search problem, but also for the nearest neighbor search problem.
3.1 Construction of kdTrees
We explain the algorithm for constructing a kdtree of a point set P for the twodimensional case. First, we find the point p for which the xcoordinate is the median of the point set P, and store p at the root of the kdtree. Next, we divide the set \(P \setminus \{p\}\) into two: the set \(P_{\mathrm {left}}\) that stores points with xcoordinates smaller than that of p, and the set \(P_{\mathrm {right}}\) that stores points with x coordinates larger than that of p. We add two children \(v_{\mathrm {left}}\), \(v_{\mathrm {right}}\) to the root of the kdtree. Next, from \(P_{\mathrm {left}}\) (\(P_{\mathrm {right}}\)), we find \(p_{\mathrm {left}}\) (\(p_{\mathrm {right}}\)) for which the ycoordinate is the median of the set, and we store \(p_{\mathrm {left}}\) (\(p_{\mathrm {right}}\)) in \(v_{\mathrm {left}}\) (\(v_{\mathrm {right}}\)). Similarly, we divide the set \(P_{\mathrm {left}} \setminus \{p_{\mathrm {left}}\}\) (\(P_{\mathrm {right}} \setminus \{p_{\mathrm {right}}\}\)) into two subsets according to ycoordinates, find medians with respect to xcoordinates, and store them in children of \(v_{\mathrm {left}}\) (\(v_{\mathrm {right}}\)), and repeat this recursively. FigureÂ 8.2 shows an example of partitioning a point set.
For a ddimensional space, we partition the space based on the first dimension, the second dimension, and so on. After partitioning the space based on the dth dimension, we use the first dimension again.
3.2 Range Search Algorithm
An important concept for understanding range searches using a kdtree is the correspondence between nodes of the kdtree and ranges. In Sect.Â 8.3.1, we explained that each node of the kdtree stores a point. We can also consider that each node corresponds to an orthogonal range. Let V(v) denote the point in P stored in node v and R(v) denote the corresponding range. Then R(v) is defined as follows:

For the root node r of the kdtree, the range R(r) is the whole space.

For a node v at depth l, the range \(R(v_{\mathrm {left}})\) for the left child \(v_{\mathrm {left}}\) of v is obtained as follows. We partition R(v) into two by the hyperplane that is perpendicular to the \((l \bmod d)\)th axis and contains V(v). Then, \(R(v_{\mathrm {left}})\) is the range with the smaller (l mod d)th coordinate value and \(R(v_{\mathrm {right}})\) is the range with the larger (l mod d)th coordinate value.
For example, in Fig.Â 8.2, the range R(h) corresponding to node h is the gray area.
The algorithm for reporting queries using a kdtree is as follows. The algorithm searches the space by traversing tree nodes from the root. Each time a node v is visited, the algorithm checks whether the corresponding point V(v) (\(\in P\)) is contained in the query range Q or not. If the range R(v) is fully contained in the query range Q, the algorithm outputs all the points stored in the subtree rooted at v. If R(v) and Q has no intersection, the algorithm terminates the search of the subtree. For a counting query, instead of outputting all the points when R(v) is contained in Q, the algorithm finds and accumulates the size of the subtree rooted at v. Although it may seem impossible to execute the algorithm since the range R(v) for node v is not explicitly stored in the kdtree, if the range R(v) for node v is known, then we know the coordinate values of the hyperplane partitioning the range from the coordinate values of point V(v), and we can compute \(R(v_{\mathrm {left}})\) and \(R(v_{\mathrm {right}})\). Therefore, we can execute the algorithm by keeping the range R(v) during the search.
3.3 Complexity Analyses
The time complexity of kdtrees is analyzed inÂ [13]. A counting query takes \(\mathrm {O}\mathord {\left(d\cdot n^{\frac{d1}{d}} + 2^d\right)}\) time. In general, we assume d is a constant and write the complexity as \(\mathrm {O}\mathord {\left(n^{\frac{d1}{d}}\right)}\). For a reporting query, we output all coordinates of points in Q. Because a point can be output in constant time, the query time complexity is \(\mathrm {O}\mathord {\left(n^{\frac{d1}{d}} + k\right)}\).
If \(d \ge \lg n\), the height of the kdtree is at most d, and therefore the space is partitioned at most d times. Then, it is necessary to traverse all the nodes and a query takes \(\mathrm {O}\mathord {\left(n\right)}\) time.
4 Wavelet Tree
Wavelet tree is a succinct data structure supporting various queries on strings and integer sequences efficiently. It was originally proposed for representing compressed suffix arraysÂ [9], but it later became known that wavelet tree can support more operationsÂ [18]. Orthogonal range search in twodimensional space is one of these operationsÂ [16].
4.1 Construction
The twodimensional point sets P that can be represented directly using wavelet tree are those where the coordinates take integer values from 1 to n and the xcoordinate values are all distinct. As explained in Sect.Â 8.2.2, without loss of generality, we can transform any point set into a point set in \([n]^d\) space. For such a twodimensional point set P, consider an integer sequence C that contains the ycoordinates of the points in increasing order of xcoordinates. For example, for the point set in Fig.Â 8.3, the corresponding integer sequence C is 4,Â 2,Â 7,Â 5,Â 0,Â 3,Â 1,Â 6. For this sequence C, we construct a wavelet tree as follows.
First, we consider that the root of the wavelet tree corresponds to C. Note that we do not store C directly in the wavelet tree. We then focus on the most significant (highest) bit of the \(\left\lceil \lg n \right\rceil \)bit binary representation of each integer in C. If it is 0 (1), the integer is moved into the left (right) child of the root. We consider that each child node of the root corresponds to an integer sequence containing the numbers in the original array C in the same order. For example, in the example in Fig.Â 8.3, integers from 0 to 3 go to the left child, and integers from 4 to 7 go to the right child. Therefore, the left child corresponds to an integer sequence 2,Â 0,Â 3,Â 1, and the right child 4,Â 7,Â 5,Â 6.
Next, for each integer sequence of child nodes, we focus on the second most significant bit of the binary representation of each number. We move a number with 0 bit to the left, and a number with 1 bit to the right. Similarly, we repeat this until the integer sequence of a node consists of the identical integer.
Note that we do not store integer sequences in nodes of the wavelet tree. In each node, we store a bit string of the same length as the corresponding integer sequence. The ith bit of the bit string is 0 (1) if the ith integer in the integer sequence goes to the left (right) child. In other words, a bit string stored in a node of depth l is the concatenation of the \((l+1)\)th highest bit of each integer in the integer sequence corresponding to the node. In the example in Fig.Â 8.3, the integer sequence corresponding to the root node is 4,Â 2,Â 7,Â 5,Â 0,Â 3,Â 1,Â 6, and because integers from 0 to 3 go to the left child and integers from 4 to 7 go to the right child, the bit string stored in the root node is 1,Â 0,Â 1,Â 1,Â 0,Â 0,Â 0,Â 1. Note that we do not store bit strings at leaf nodes. We show the information stored in the wavelet tree in the right tree in Fig.Â 8.3. Only bit strings drawn above the dark gray rectangles, that is, those in the lower row of each node, are stored.
Note that although it may seem impossible to recover the original information (the integer sequence) from these bit strings, it is possible. Consider the recovery of the fourth integer of the wavelet tree in Fig.Â 8.3 (right). From the bit string stored in the root node, we know that the first bit of the integer is 1. Because this 1 bit corresponding to the fourth integer is the third 1 in the bit string, we know that the integer to be recovered corresponds to the third bit of the bit string in the right child of the root node. If we look at the third bit of the right child, we know that the second bit of the integer is 0. Further, this 0 bit is the second 0 in the bit string, the integer to be recovered corresponds to the second bit of the left child of the current node. Finally, from the second bit of the left child, we know the last bit of the integer to be recovered is 1. Therefore, the fourth integer is 101 in binary, that is, 5. This is shown in Fig.Â 8.4.
In this recovery operation, we need to compute the number of zeros/ones in the first i bits of a bit string. This operation is also used in the range search algorithm in the next section. If we look at bits one by one from the beginning of a bit string, it takes \(\mathrm {O}\mathord {\left(n\right)}\) time, which is too slow. We therefore represent the bit string of each node by the following data structureÂ [5, 12].
Lemma 8.2
For a bit string of length n, there exists a data structure using \(n + \mathrm {o}\mathord {\left(n\right)}\) bits which answers a rank/select query in constant time, where the rank query \(\mathrm {rank}_{b}\left(B,i\right)\) is to count the number of b bits (\(b=0,1\)) in the bits from B[0] to B[i] (\(i \ge 0\)) of a bit string B, and the select query \(\mathrm {select}_{b}\left(B,i\right)\) is to return the position of the ith b (\(i \ge 1\), \(b=0,1\)) in a bit string B.
The select query is also necessary for range searches using a wavelet tree.
4.2 Range Search Algorithm
We explain how to solve the twodimensional range search problem using a wavelet tree. First, we explain the counting query, which is performed by a recursive function as in AlgorithmÂ 1. For a query range \(Q=[l, r] \times [b, t]\), the argument of the function is \(\textsc {WTCounting}(l, r, b, t, v_{\mathrm {root}}, 0, 2^{\left\lceil \lg n \right\rceil }1)\), where \(v_{\mathrm {root}}\) is the root node of the wavelet tree. The left (right) child of node v is represented by \(v_{\mathrm {left}}\) (\(v_{\mathrm {right}}\)). The bit string stored in node v is represented by v.B.
We explain the algorithm in Fig.Â 8.5 using the example of searching a range \(Q = [1,6] \times [1,4]\) for the point set P in Fig.Â 8.3.
The search algorithm traverses the tree from the root. During the search, the algorithm keeps the interval I of an integer sequence (or bit string) corresponding to an interval of the xcoordinate of the query range. In the example in Fig.Â 8.5, we focus on the interval \(I = [1,6]\) at the root node. To move to the left child, we need to compute the interval corresponding to the query range. This is done by a rank query that counts the number of zeros from the beginning of the bit string to a specified position. In the bit string stored in the root node, the number of zeros from the beginning to the 0th position (in general, if the interval is \(I = [l , r]\), to (\(l1\))th position) is 0, so we know the interval corresponding to the query starts at position 0. Because the number of zeros from the beginning to the 6th position (in general, if the interval is \(I = [l , r]\), to rth position) is four, we know the interval ends at position 3. Thus, we obtain the interval \(I = [0,3]\) for the left child. Similarly, for the right child, by using rank queries counting the number of ones, we can obtain the interval \(I = [1,2]\).
We repeat this process by going down the tree maintaining an interval. When we reach a leaf, we can determine if the ycoordinate of the point is included in the query range. However, we can sometimes determine this at an earlier stage. For example, in Fig.Â 8.5, after obtaining the interval \(I = [1,2]\) at the left child of the root, for the right child of the current node the interval of the ycoordinate corresponding to the node is [2,Â 3], which is completely included in the interval [1,Â 4] of the ycoordinate of the query range. Therefore, for the two points we focus on at this node, both the x and ycoordinates are included in the query range, and we found two points in the query range. However, after computing the interval \(I = [1,2]\) for the right child of the root, the interval of the ycoordinate corresponding to the right child of the current node is [6,Â 7], which has no intersection with the interval [1,Â 4] of the ycoordinate of the query range. We do not need to further search the subtree.
As observed above, in a range search using a wavelet tree, if the query range is \(Q = [l,r] \times [b,t]\), we first focus on points for which the xcoordinates are contained in Q, that is, contained in the range \([l,r] \times [0,n1]\). Next, the process of traversing down the tree corresponds to partitioning the range into two according to the ycoordinate. If an obtained range is completely contained in the query range, or does not intersect with the query range, we terminate searching the subtree.
For counting queries, it is sufficient to sum the number of points. For reporting queries, the extra work of computing the coordinates of the points is also required. This is shown in AlgorithmÂ 2.
The outline of the reporting query is the same as the counting query. In AlgorithmÂ 1, we obtain the number of points in LineÂ 2. We change it one by one to output coordinates of points corresponding to the interval \([x_1,x_2]\) of the bit string v.B. The x and ycoordinates of each point are obtained by \(\textsc {WTReportX}\) and \(\textsc {WTReportY}\), respectively. The algorithm \(\textsc {WTReportY}\) for computing the ycoordinate (AlgorithmÂ 4) is similar to the algorithm for recovering a value of the original integer array explained in Sect.Â 8.4.1. We compute the ycoordinates by traversing down the tree using rank queries.
In contrast, the algorithm \(\textsc {WTReportX}\) for computing the xcoordinate (AlgorithmÂ 3) traverses up the tree using select queries. We explain this by example. In Fig.Â 8.5, assume that at node v, which is the right child of the left child of the root, we find that points corresponding to the interval \(I=[0,1]\) are contained in the query range. Consider the computation of the xcoordinate of the point corresponding to the bit v.B[1]. First, the node v we focus on is the right child of its parent. We find the position of the second 1 in the parent by a select query. Then we know that the point corresponds to the bit \(v'.B[2]\) in the parent node \(v'\). Next, because the current node is the left child of the parent (the root), we find the position of the third 0 in the bit string of the parent by a select query. Now we know that the point corresponds to the bit r.B[5] at the root node r. That is, the xcoordinate of the point is 5.
As shown above, we can traverse the nodes of the wavelet tree using rank and select queries on bit strings. For range searches, we traverse down the tree from the root computing the intervals of the xcoordinate corresponding to the query range. If we find a node where the corresponding interval of the ycoordinate is contained in the query range, we answer the query by computing the length of the interval or coordinate values by traversing the tree.
4.3 Complexity Analyses
We now analyze the space complexity of the wavelet tree and query time complexities for the orthogonal range search problem.
First, we analyze the space complexity. The height of the wavelet tree is \(\left\lceil \lg n \right\rceil \). The total length of bit strings stored in the nodes with the same depth is always n. Therefore, the total length of all the bit strings in the wavelet tree is \(n \lg n\). We can concatenate all the bit strings and store only a long bit string. Then it is not necessary to store the tree structure of the wavelet tree. By using the data structure of LemmaÂ 8.2 for this long bit string, the space complexity is \(n \lg n + \mathrm {o}\mathord {\left(n \lg n\right)}\) bits in total.
Next, we consider query time complexities. For a counting query, we consider the number of visited nodes. In the wavelet tree, each time we traverse an edge toward a leaf, points with small ycoordinates go to the left child, and points with large ycoordinates go to the right child. At leaves we can consider that all points are sorted in increasing order of ycoordinates. This means that leaf nodes corresponding to the interval of ycoordinates of the query range exist in a consecutive place in the wavelet tree. Now, consider the set M of nodes of the wavelet tree defined as follows. The set M contains a maximal node v such that the ycoordinates corresponding to the leaf nodes in the subtree rooted at v are contained in the query range, that is, the ycoordinates of the leaves in the subtree of v are contained in the query range but the subtree of the parent of v contains some node for which the corresponding ycoordinate is not contained in the query range. This is the set of nodes from which we do not further search the subtree for a counting query using the wavelet tree, and in Fig.Â 8.6, it is shown as dark gray nodes.
Let A be the set of nodes that are ancestors of nodes of M. This is the set of nodes visited before reaching nodes of M which are shown as light gray nodes in Fig.Â 8.6. The number of nodes visited in a counting query is then \(A + M\). We now consider the size of M and A.
For the size of the set M, the following lemma holds.
Lemma 8.3
It holds \(M = \mathrm {O}\mathord {\left(\lg n\right)}\).
Proof
(LemmaÂ 8.3) The set M is constructed as follows. Let \(M'\) be the set of leaf nodes of the wavelet tree corresponding to the interval of ycoordinates in the query range. For the nodes of \(M'\), if two nodes \(v_1\) and \(v_2\) have a common parent node v, we remove \(v_1\) and \(v_2\) from \(M'\) and add v to \(M'\). By repeating this process until there are no such pairs of nodes, the set \(M'\) coincides with M.
For each depth of the wavelet tree, the number of nodes of depth belonging to M is then at most two, because if there exist more than two nodes, two of them must have the same parent. This completes the proof that \(M = \mathrm {O}\mathord {\left(\lg n\right)}\).
For the size of the set A, the following lemma holds.
Lemma 8.4
It holds \(A = \mathrm {O}\mathord {\left(\lg n\right)}\).
Proof
(LemmaÂ 8.4) Consider a node v in the set A. In the set of leaf nodes in the subtree rooted at v, there must exist a leaf node where the corresponding ycoordinate is included in the query range and a leaf node where the corresponding ycoordinate is not included in the query range. Therefore, for each depth of the wavelet tree, there are at most two such nodes in A, because if there exists more than two such nodes, for a node in the middle, the corresponding ycoordinates of the leaves in the subtree rooted at that node are contained in the query range. This completes the proof that \(A = \mathrm {O}\mathord {\left(\lg n\right)}\).
From the above discussion, the number of nodes visited in a counting query is \(A + M = \mathrm {O}\mathord {\left(\lg n\right)}\). When we visit a new node, we use a constant number of rank queries. Because a rank query takes constant time (LemmaÂ 8.2), the time complexity of a counting query using the wavelet tree is \(\mathrm {O}\mathord {\left(\lg n\right)}\).
For a reporting query, it is necessary to compute coordinates of points in the query range. As explained in Sect.Â 8.4.2, xcoordinates are computed by traversing up the tree and ycoordinates are computed by traversing down the tree, with the coordinates of each point computed by visiting \(\mathrm {O}\mathord {\left(\lg n\right)}\) nodes. Moving to an adjacent node in the wavelet tree is done by a constant number of rank/select queries, and each rank/select query takes constant time (LemmaÂ 8.2). Therefore, the coordinates of a point are obtained in \(\mathrm {O}\mathord {\left(\lg n\right)}\) time, and the time complexity for a reporting query using the wavelet tree is \(\mathrm {O}\mathord {\left((1+k)\lg n\right)}\), where k is the number of output points.
We obtain the following theorem.
Theorem 8.1
The space complexity of the wavelet tree representing a twodimensional point set on \([n]^2\) is \(n \lg n + \mathrm {o}\mathord {\left(n \lg n\right)}\) bits, and a counting query takes \(\mathrm {O}\mathord {\left(\lg n\right)}\) time, and a reporting query takes \(\mathrm {O}\mathord {\left((k+1)\lg n\right)}\) time, where k is the number of points to enumerate.
As shown in Sect.Â 8.2.1, the informationtheoretic lower bound for a point set on \([n]^2\) is \(n \lg n + \mathrm {O}\mathord {\left(n\right)}\) bits. Therefore, the wavelet tree is a succinct data structure.
Theorem 8.2
Let P be a set of points on \(M = [1..n]\times [1..n]\) in which all points have distinct xcoordinates. Then, there exists a data structure using \(n\lg n + \mathrm {o}\mathord {\left(n\lg n\right)}\) bits that answers a counting query in \(\mathrm {O}\mathord {\left(\lg \lg n\right)}\) time and a reporting query in \(\mathrm {O}\mathord {\left((1+k) \lg n / \lg \lg n\right)}\) time, where k is the number of points to output.
5 Proposed Data Structure 1: Improved Query Time Complexity
This data structure uses the idea of adding data structures to the kdtree to improve the query time complexityÂ [20]. First, we explain the idea ofÂ [20] in Sect.Â 8.5.1. Next, we explain the algorithm of range search in Sect.Â 8.5.3, and analyze the time complexity in Sect.Â 8.5.4.
5.1 Idea for Improving the Time Complexity of the kdTree
The method proposed inÂ [20] improves the query time complexity of the kdtree by adding d many wavelet trees to the kdtree such that the term \(n^{(d1)/d}\) is replaced by \(n^{(d2)/d}\) (\(\lg n\) if \(d=2\)), at the cost of increasing the total complexity by a factor of \(\mathrm {O}\mathord {\left(\lg n\right)}\). Note that we assume point sets are on \([n]^d\).
First, we construct the kdtree for a given set P of points in the ddimensional space. Next, we label the nodes of the kdtree with numbers based on the inorder traversal of a binary tree defined as follows:

If the root node has a left child, we traverse the subtree rooted at the node.

Examine the root node.

If the root node has a right child, we traverse the subtree rooted at the node.
FigureÂ 8.7 shows an example of a point set (left) and numbers assigned based on the inorder traversal of the kdtree of the set (right).
Next, we make point sets \(P_i\ (i = 0, \ldots , d1)\) with n points on \([n]^2\). The twodimensional point set \(P_i\) is created as follows. If a point p in the original ddimensional point set P has the ith coordinate value \(p_i\) and the inorder position of the node of the kdtree containing p is j, we add point \((j, p_i)\) to \(P_i\). FigureÂ 8.8 shows the point sets \(P_0, P_1\) created from the point set in Fig.Â 8.7.
From these twodimensional point sets \(P_0, \ldots ,P_{d1}\), we construct wavelet trees \(W_0, \ldots ,W_{d1}\). The wavelet trees \(W_i\) can be thought of as constructed from an integer sequence \(A_i\) containing the ith coordinate value of points in P in the order of the kdtree.
These data structures can be used for range searches as follows. Given a query range Q, we perform the original search using the kdtree. In the original algorithm, as explained in Sect.Â 8.3, we traverse the kdtree and shrink the range R(v), and when \(\mathrm {CDeg}(R(v),Q) = d\) (i.e., \(R(v) \subseteq Q\)), we know that all the points in the subtree rooted at v are contained in Q. By using the d wavelet tree, we can terminate the search when \(\mathrm {CDeg}(R(v),Q) = d1\). Assume that when a node v is visited, R(v) is contained in Q for all dimensions except for i. The inorder numbers of nodes in the subtree rooted at node v have consecutive values. Let [a,Â b] be the interval for the numbers. Then, the points in this interval are contained in Q except for dim. i. This implies that points in \(P_i\) that are contained in the range \([a,b] \times [l_i^{(Q)}, u_i^{(Q)}]\) are contained in Q even for dim. i. Therefore, after finding the node v, it is sufficient to search the range \([a,b] \times [l_i^{(Q)}, u_i^{(Q)}]\) of \(P_i\) using wavelet trees \(W_i\).
The number of nodes of the kdtree visited by this method is \(\mathrm {O}\mathord {\left(n^{(d2)/d}\right)}\) (\(\mathrm {O}\mathord {\left(\lg n\right)}\) for the case \(d=2\)). The search of the last dimension using the wavelet tree takes \(\mathrm {O}\mathord {\left(\lg n\right)}\) time for a counting query. Therefore, the time complexity for a counting query using the kdtree is improved to \(\mathrm {O}\mathord {\left(n^{(d2)/d} \lg n\right)}\) (\(\mathrm {O}\mathord {\left(\lg ^2 n\right)}\) for the case \(d=2\)).
5.2 Index Construction
We now explain the proposed data structure. First, we construct the kdtree for a given point set P. Note that this kdtree is temporarily built in order to construct our data structure, and is not included in the final structure. Next, as in Sect.Â 8.5.1, we number the nodes of the kdtree by an inorder traversal, and create d many twodimensional point sets \(P_0 , \ldots , P_{d1}\). For each \(P_i\), we create the data structure ofÂ [3]. Let \(B_i\) be this data structure. Finally, we discard the kdtree. The final data structure consists of \(B_0, \ldots ,B_{d1}\).
5.3 Range Search Algorithm
We explain the algorithm for a reporting query using the data structure explained in the previous section. The pseudocode is shown in AlgorithmÂ 5. This algorithm simulates a search of the kdtree using \(B_0 , \ldots ,B_{d1}\). We explain it in comparison with the search algorithm of the kdtree. Note that we explain the algorithm assuming the inorder number of each node v of the kdtree is also assigned to the point V(v) stored in v. That is, if we say a point with number j, it is the point stored in the node with inorder position j. We also assume that for an interval [a,Â b] of point numbers, R([a,Â b]) denotes the range containing points that have numbers in [a,Â b]. In AlgorithmÂ 5, the interval [a,Â b] of point numbers always corresponds to the interval of inorder numbers of nodes in the subtree rooted at a node v of the kdtree. Therefore, R([a,Â b]) coincides with R(v).
If we use the kdtree, we shrink the focused range R(v) by going down the tree. In the proposed method, by shrinking the interval [a,Â b] of point numbers, we reduce the corresponding range R([a,Â b]). Because the kdtree stores the point V(v) corresponding to a node v, we can obtain the information of the point used for partitioning the space. In contrast, in the proposed method, points are not explicitly stored. However, if the focused interval [a,Â b] coincides with the interval of inorder numbers for the subtree rooted at a node v, we find \(c = \lceil (a+b)/2 \rceil \) is the number of the points used for partitioning.^{Footnote 1} Furthermore, the intervals \([a,c1]\) and \([c+1,b]\) correspond to the intervals of the numbers for subtrees rooted at the left and right child of v, respectively. Therefore, by a recursive search of AlgorithmÂ 5, we can obtain the correct partitioning points.
For the range R([a,Â b]), we can compute the ranges after a partition from the range before partition and the coordinates of the point used for partitioning similarly to the case of the kdtree.
5.4 Complexity Analyses
We now analyze the complexities of the algorithm. First, we consider its space complexity. We use d data structures of Bose et al.Â [3] each of which uses \(n\lg n + \mathrm {o}\mathord {\left(n\lg n\right)}\) bits as in TheoremÂ 8.2. The total space complexity is then \(dn\lg n + \mathrm {o}\mathord {\left(dn\lg n\right)}\) bits.
Next, we consider the query time complexity. If we use the same analysis as inÂ [20], assuming d is a constant, we can show the number of nodes corresponding to cells with containment degree of at most \(d1\) is \(\mathrm {O}\mathord {\left(n^{\frac{d2}{d}}\right)}\). Here, we derive the query time complexity using a novel analysis for nonconstant d.
The proposed method partitions the space for each dimension in order, in the same fashion as for the kdtree. As inÂ [20], we define a series of partitions with respect to dim. 0 to dim. d1 as a cycle. We then calculate the number \(T_m(n,d)\) of nodes at which the containment degree with respect to Q is at most \(d2\) in the mth cycle. When the (\(m1\))th cycle has finished, the space is partitioned into \(2^{d(m1)}\) many cells. Among them, we count the number of cells for which the containment degree with respect to Q is at most \(d2\). These cells contain a (\(d2\))dimensional face of Q (an edge of a cuboid if \(d=3\)). A (\(d2\))dimensional face of a ddimensional orthogonal range Q is obtained by choosing two dimensions from the ddimensions and choosing the upper side or the lower side of the range for each of the two dimensions. Therefore, Q has \(\left( {\begin{array}{c}d\\ 2\end{array}}\right) 2^2\) many (\(d2\))dimensional faces. When the (\(m1\))th cycle has finished, because each dimension is partitioned into \(2^{m1}\) cells, the number of cells containing a (\(d2\))dimensional face is at most \(2^{(m1)(d2)}\). Then after the (\(m1\))th cycle, the number of cells to be searched is at most
In the subtrees rooted at these nodes, the number of nodes in the mth cycle is \(2^d1\). Therefore, it holds that
Let N(n,Â d) be the number of nodes for which the containment degree with respect to Q is at most \(d2\). It then holds that
We use the fact that the containment degree is weakly increasing as we traverse down the tree. In the proposed method, we terminate the search when the containment degree reaches \(d1\). The visited nodes are then those with containment degree of at most \(d2\) and their child nodes. There are at most 2N(n,Â d) such nodes. The proposed method virtually traverses the kdtree. It takes \(\mathrm {O}\mathord {\left(d \frac{\lg n}{\lg \lg n}\right)}\) time to compute the coordinates of a point stored in a node. When a node for which the containment degree with respect to Q is \(d1\), we search the last one dimension in \(\mathrm {O}\mathord {\left(\frac{\lg n}{\lg \lg n}\right)}\) time. The time complexity of a counting query is then \(\mathrm {O}\mathord {\left(d^3 n^{\frac{d2}{d}} \frac{\lg n}{\lg \lg n}\right)}\). For a reporting query, it takes \(\mathrm {O}\mathord {\left(d \frac{\lg n}{\lg \lg n}\right)}\) time to compute the coordinates of a point. The total time complexity is then \(\mathrm {O}\mathord {\left(\left( d^3 n^{\frac{d2}{d}} + dk \right) \frac{\lg n}{\lg \lg n}\right)},\) where k is the number of points in Q. In summary, we obtain the following:
Theorem 8.3
For an orthogonal range search problem on the \([n]^d\) space, there exists a data structure that has space complexity of \(dn \lg n + \mathrm {o}\mathord {\left(dn \lg n\right)}\) bits and which answers a counting query in \(\mathrm {O}\mathord {\left(d^3 n^{\frac{d2}{d}} \frac{\lg n}{\lg \lg n}\right)}\) time and a reporting query in \(\mathrm {O}\mathord {\left(\left( d^3 n^{\frac{d2}{d}} + dk \right) \frac{\lg n}{\lg \lg n}\right)}\), where k is the number of points in the query range.
6 Proposed Data Structure 2: Succinct and Practically Fast
The second proposed method is a data structure that is succinct and practically fast. In this method, we use \(d1\) many wavelet trees to represent a point set on \([n]^d\). In Sect.Â 8.6.1, we explain how to construct the data structure. In Sect.Â 8.6.2, we explain the algorithm for the orthogonal range search problem. In Sect.Â 8.6.3, we analyze the space and time complexities.
6.1 Index Construction
In this method, we assume that the points of P have distinct values in the 0th coordinate value.
First, we create lengthn integer arrays \(A_1, \ldots , A_{d1}\). The array \(A_i\) corresponds to dim. i, and stores the ith coordinate value of the points in increasing order of the 0th coordinate value. Next, for those arrays we create wavelet trees \(W_1, \ldots ,W_{d1}\). The wavelet trees \(W_i\) can be considered to represent the twodimensional point set \(P_i\) generated from the ddimensional point set P by projecting the points onto the plane spanned by the \(0\)th axis and the \(i\)th axis. FigureÂ 8.9 shows an example.
6.2 Range Search Algorithm
Next, we explain how to solve the orthogonal range search problem using the data structure (AlgorithmÂ 6).
Assume that a query range \(Q = \left[ l^{(Q)}_{0}, u^{(Q)}_{0}\right] \times \cdots \times \left[ l^{(Q)}_{d1}, u^{(Q)}_{d1}\right] \) is given. For each \(i = 1, \ldots , d1\) such that \(\left[ l^{(Q)}_{i}, u^{(Q)}_{i}\right] \ne [0, n1]\), that is, the dimension i used for the search, we count the number of points of \(P_i\) that are contained in range \(\left[ l^{(Q)}_{0}, u^{(Q)}_{0}\right] \times \left[ l^{(Q)}_{i}, u^{(Q)}_{i}\right] \) using wavelet trees \(W_i\) (counting query). Let m (\(= D\)) be the number of i (\(= 1, \ldots , d1\)) such that \(\left[ l^{(Q)}_{i}, u^{(Q)}_{i}\right] \ne [0, n1]\), and let \(i_1, \ldots ,i_m\) be the sorted ones in increasing order of the number of answers of counting queries.
Using wavelet trees \(W_{i_1}\), we then enumerate only the xcoordinates of points of \(P_{i_1}\) contained in \(\left[ l^{(Q)}_{0}, u^{(Q)}_{0}\right] \times \left[ l^{(Q)}_{i_1}, u^{(Q)}_{i_1}\right] \) and store them in a set A. For each element a of A and for each \(i = i_2, \ldots , i_m\), we check whether the ith coordinate of a point for which the 0th coordinate is a is contained in the query range. The elements remaining in A correspond to points in the query range. The answer to a counting query is the cardinality of A. For a reporting query, we compute coordinates of the points and output them.
The reason we compute the number of points contained in each dimension by a counting query is twofold. Firstly, the xcoordinate (the 0th coordinate) of points contained in the query range with respect to the \(i_1\)th (and the 0th) dimension can be output quickly at line 9 of the algorithm if the number of points to enumerate is small. Secondly, in the double loops from line 10 to line 16, we want to reduce the size of A as soon as possible.
6.3 Complexity Analyses
Consider the space and time complexities of the proposed method.
For the space complexity, we use \(d1\) many wavelet trees. Therefore, the space complexity is \((d1) \lg n + (d1) \cdot \mathrm {o}\mathord {\left(\lg n\right)}\) bits.
For the query time complexity, let m be the number of wavelet trees used in a search. The time to perform m counting queries on wavelet tree is \(\mathrm {O}\mathord {\left(m \lg n\right)}\). We then sort m integers in \(\mathrm {O}\mathord {\left(m \lg m\right)}\) time. Next, we enumerate the xcoordinates of points contained in the query range for the dimension with the minimum number of points. Let \(c_{i_1} = c_{\min }\) be the number of points to enumerate. This takes \(\mathrm {O}\mathord {\left((1 + c_{\min }) \lg n\right)}\) time. The time to check whether these points are contained in the query range for other dimensions is \(\mathrm {O}\mathord {\left((m1)c_{\min } \lg n\right)}\). Let \(d'\) be the number of dimensions used in the query, then it holds that \(m \le d'\). Therefore, the query time complexity can be written as \(\mathrm {O}\mathord {\left(d'c_{\min } \lg n + d'\lg d'\right)}\).
7 Conclusion
In this chapter, we first reviewed data structures for highdimensional orthogonal range search. We then proposed two data structures for the problem.
The first one simulates the search of the kdtree using d succinct data structures for twodimensional orthogonal range search data structuresÂ [3]. We improved the query time complexity of KDWtree while keeping the same space complexity.
The second one is succinct and practically fast. The space complexity is \((d1)n\lg n + (d1)\cdot \mathrm {o}\mathord {\left(n \lg n\right)}\), which is succinct. The worstcase query time complexity is \(\mathrm {O}\mathord {\left(dn \lg n\right)}\), which is not good. However, if the number d of dimensions is large but the number \(d'\) of dimensions used in a search is small, it runs fast in practice.
Notes
 1.
In the kdtree, at each depth, we partition the space by the median of the point set with respect to a dimension, and therefore \(c = \lceil (a+b)/2 \rceil \) is the number of the point used for partitioning. If the point set contains an even number of points, we can obtain the correct partitioning point using a predetermined rule.
References
J.L. Bentley, Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509â€“517 (1975)
J.L. Bentley, Decomposable searching problems. Inf. Process. Lett. 8(5), 244â€“251 (1979)
P. Bose, M. He, A. Maheshwari, P. Morin, Succinct orthogonal range search structures on a grid with applications to text indexing, in Proceedings of the 11th Workshop on Algorithms and Data Structures (WADS 2009) (Springer, 2009), pp.Â 98â€“109
B. Chazelle, A functional approach to data structures and its use in multidimensional searching. SIAM J. Comput. 17(3), 427â€“462 (1988)
D. Clark, Compact Pat Trees. Ph.D. thesis (University of Waterloo, 1997)
R.A. Finkel, J.L. Bentley, Quad trees a data structure for retrieval on composite keys. Acta Inform. 4(1), 1â€“9 (1974)
H.N. Gabow, J.L. Bentley, R.E. Tarjan, Scaling and related techniques for geometry problems, in Proceedings of the 16th Annual ACM Symposium on Theory of Computing (STOC 1984) (ACM, 1984), pp.Â 135â€“143
T. Gagie, G. Navarro, S.J. Puglisi, New algorithms on wavelet trees and applications to information retrieval. Theor. Comput. Sci. 426â€“427, 25â€“41 (2012)
R. Grossi, A. Gupta, J.S. Vitter, Highorder entropycompressed text indexes, in Proceedings of the 14th Annual ACMSIAM Symposium on Discrete Algorithms (SODA 2003) (SIAM, 2003), pp.Â 841â€“850
K. Ishiyama, K. Sadakane, Practical spaceefficient data structures for highdimensional orthogonal range searching, in Proceedings of the 10th International Conference on Similarity Search and Applications (SISAP 2017) (Springer, 2017), pp.Â 234â€“246
K. Ishiyama, K. Sadakane, A succinct data structure for multidimensional orthogonal range searching, in Proceedings of Data Compression Conference 2017 (DCC 2017) (2017), pp. 270â€“279
G.J. Jacobson, Succinct Static Data Structures. Ph.D. thesis (Pittsburgh, PA, USA, 1988)
D.T. Lee, C.K. Wong, Worstcase analysis for region and partial region searches in multidimensional binary search trees and balanced quad trees. Acta Inform. 9(1), 23â€“29 (1977)
D.T. Lee, C.K. Wong, Quintary trees: a file structure for multidimensional datbase sytems. ACM Trans. Database Syst. 5(3), 339â€“353 (1980)
G.S. Lueker, A data structure for orthogonal range queries, in Proceedings of the 9th Annual Symposium on Foundations of Computer Science (SFCS 1978) (IEEE, 1978), pp. 28â€“34
V. MÃ¤kinen, G. Navarro, Positionrestricted substring searching, in Proceedings of the 7th Latin American Symposium on Theoretical Informatics (LATIN 2006) (Springer, 2006), pp.Â 703â€“714
G.M. Morton, A Computer Oriented Geodetic Data Base and a New Technique in File Sequencing (International Business Machines Company, New York, 1966)
G. Navarro, Wavelet trees for all. J. Discrete Algorithms 25, 2â€“20 (2014)
G. Navarro, Compact Data Structures: A Practical Approach (Cambridge University Press, 2016)
Y. Okajima, K. Maruyama, Faster linearspace orthogonal range searching in arbitrary dimensions, in Proceedings of the 17th Workshop on Algorithm Engineering and Experiments (ALENEX 2015) (SIAM, 2015), pp.Â 82â€“93
J. Oâ€™Rourke, J.E. Goodman, Handbook of Discrete and Computational Geometry (CRC Press, 2004)
R. Raman, V. Raman, S.R. Satti, Succinct indexable dictionaries with applications to encoding kary trees, prefix sums and multisets. ACM Trans. Algorithms 3(4), Article 43 (2007)
D.E. Willard, PredicateOriented Database Search Algorithms. Technical report (Harvard Univ Cambridge MA Aiken Computation Lab, 1978)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
Â© 2022 The Author(s)
About this chapter
Cite this chapter
Ishiyama, K., Sadakane, K. (2022). Orthogonal Range Search Data Structures. In: Katoh, N., et al. Sublinear Computation Paradigm. Springer, Singapore. https://doi.org/10.1007/9789811640957_8
Download citation
DOI: https://doi.org/10.1007/9789811640957_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 9789811640940
Online ISBN: 9789811640957
eBook Packages: Computer ScienceComputer Science (R0)