So far, we considered that alert zones are a fixed input to the system, and we provided data encodings and optimizations to reduce computational overhead under this constraint. Since alert zones were not modified, we maintained the amount of location disclosure to a minimum, i.e., an adversary could only learn whether a specific ciphertext corresponded to a location inside the alert zone or not. In this section, we consider a relaxation of the alert zone extent in order to improve performance. Specifically, given an input alert zone, we investigate whether it is possible to slightly enlarge it such that the resulting set of tokens needed to implement secure notification requires fewer bilinear pairings to evaluate.
To maintain the level of additional disclosure low, we allow only a relatively small enlargement factor, expressed as a ratio of the alert zone area, and quantified by a bound parameter α. Given an enlargement factor α, our proposed alert zone expansion heuristic determines an enlarged area with significantly lower processing overhead. In effect, this proposed optimization trades a small amount of additional location disclosure for a significant boost in matching performance. As a salient feature of this optimization, the privacy-performance trade-off can be tuned using a single parameter (α).
The optimization is deployed at the TA, which is in charge of generating search tokens. In an actual deployment, since the TA is trusted, it can perform additional steps to check whether the enlarged zone is acceptable from a security standpoint, for instance by comparing it against a set of pre-defined policies. In this paper, we only focus on the performance aspect, and derive effective algorithms that quickly generate enlarged search tokens (the policy aspect is outside our scope). Similar to optimizations from prior sections, the zone expansion is guided by the objective of deriving tokens with fewer non-wildcard elements, which results in less computation. The expansion technique assumes the same hierarchical data domain representation considered so far, and works in conjunction with either hierarchical or Gray encodings.
We denote by base cell a cell in the leaf level of the hierarchical domain representation (recall that, the domain is split into d × d base cells, where d is a power of two). The hierarchy has a number of \(1 + \log d\) levels. At level k, an aggregate cell consists of 2k × 2k base cells. Specifically, at the leaf level, numbered as k = 0, each cell is a base cell, whereas at the top of the hierarchy (level \(\log d\)) there is a single cell with size d × d (expressed in terms of base cells). We identify a cell at level k by its coordinates within that level: (x, y)k. The binary identifier of a cell consists of a binary string, which can be immediately derived from its coordinates.
Algorithm 4 captures the main steps of the proposed heuristic alert zone expansion technique. The input consists of expansion factor α and initial alert zone A. The heuristic is given a maximum budget W = ⌊α|A|⌋ base cells that it can add to the initial zone, where |A| is the area of the initial zone expressed in terms of base grid cells. The output of Algorithm 4 is an expanded zone \(\hat {A}\) such that \(|\hat {A}| \leq |A| + W\) and the number of bilinear pairings required to evaluate \(\hat {A}\) is lower than that of A.
The ExpandZone routine (Algorithm 4) works by considering each level of the data domain hierarchy.
An essential step of ExpandQuery is the SelectPatchesSingleLevel routine (detailed in Algorithm 9) which finds patches to add to the current set of zone cells (line 4). A patch (formally defined in Section 5.1) is a set of cells added to the zone in a single iteration. If the new set of zone cells, denoted as CrtZone, does not require more pairings than the current set of cells, an expansion is made with the cells in the patch and we continue to the next level.
In order to prepare for the next level (lines [9-18]), parameters and indices are adjusted. Budget W is reduced by a factor of 4, since in the next level the size of one cell is equal to 4 cells in the current level. Similarly, indices of zone cells are divided by two. The intuition behind dividing the indices by two is that all 2 × 2 areas containing zone cells in this level must be fully covered by the expansion, which means all cells in those areas are in CrtZone.
Algorithm 4 stops when one of the following conditions is met: (i) the new set of zone cells increases the number of pairings; (ii) budget W is exhausted; or (iii) the zone expands to the entire root level.
To illustrate the zone expansion algorithm, consider the example in Fig. 8. Zone cells are shown in grey color, and budget is set to W = 10. An area with x ∈ [x1, x2], y ∈ [y1, y2] at level k is denoted as \(R^{k}_{[x_{1}, x_{2}] \times [y_{1}, y_{2}]}\). Starting at level k = 0 (Fig. 8a), the 2 × 2 areas \(R^{0}_{[4, 5] \times [0, 1]}\), \(R^{0}_{[6, 7] \times [2, 3]}\), and \(R^{0}_{[4, 5] \times [4, 5]}\) are considered for expansion. All six cells with diagonal stripes are added to the current set of zone cells to fill those three areas. To prepare for expansion at the next level, the coordinate ids of zone cells must be adjusted for each 2 × 2 area. For example, for k = 1, area \(R^{0}_{[4, 5] \times [0, 1]}\) becomes cell (2,0)1, area \(R^{0}_{[6, 7] \times [2, 3]}\) becomes cell (3,1)1 and so on. The budget W is reduced to \(\lfloor \frac {10 - 6}{4}\rfloor = 1\). Next, at level k = 1 (Fig. 8b), the areas \(R^{1}_{[2, 3] \times [0, 1]}\) and \(R^{1}_{[2, 3] \times [2, 3]}\) are considered for expansion. The cells with diagonal stripes in range \(R^{1}_{[3, 3] \times [0, 0]}\) (which equals \(R^{0}_{[6, 7] \times [0, 1]}\) in the base grid) are added to the zone.
Patch assembly
Next, we focus on the process of assembling patches at each level k of the data domain hierarchy.
A patch is a set of cells that can be combined with existing zone cells to reduce the number of non-wildcard elements in a search token.
We denote the cells belonging to a patch as attached cells, and the zone cells adjacent to the patch as attaching cells. A patch is associated with a local cost and gain: the cost measures the increase in alert zone area, whereas the gain quantifies the resulting reduction in bilinear pairing operations when the patch is added to the zone.
We consider as patch candidate each 2 × 2 cellFootnote 3\(R^{k}_{[x, x + 1] \times [y, y + 1]}\) that satisfies the following conditions: (i) has even x and y coordinates, (ii) contains at least one zone cell, and (iii) has at least one non-zone cell. Revisiting the example in Fig. 8a, the area \(R^{0}_{[4, 5] \times [0, 1]}\) composed of 2 × 2 base cells is a patch candidate. Note that, not all 2 × 2 cell areas are valid candidates for patches. For instance, \(R^{0}_{[5, 6] \times [0, 1]}\) has an odd x; \(R^{0}_{[6, 7] \times [0, 1]}\) does not contain any zone cell; and \(R^{0}_{[4, 5] \times [2, 3]}\) does not contain any non-zone cell.
For each valid patch candidate \(R^{k}_{[x, x + 1] \times [y, y + 1]}\), cells are indexed in a spiral order, as shown in Fig. 9a. We use this indexing order because it simplifies the process of patch assembly, as will be described later in Section 5.2. In order to keep track of zone and non-zone cells, a boolean array marked is maintained, such that marked[i] = True if ith cell within \(R^{k}_{[x, x + 1] \times [y, y + 1]}\) is a zone cell, and marked[i] = False, otherwise. Figure 9b shows a marked array for the area in Fig. 9a. The marked array is constructed by checking for each cell within the area whether or not it belongs to the alert zone. The marking procedure is summarized in Algorithm 5.
Using the marked array, patch candidates are constructed such that one or more non-zone cells can be attached to zone cells to reduce the number of pairings. Figure 10 shows several examples of patches for an area with 2 × 2 cells containing one, two, or three zone cells. In each example, the non-zone cell (striped fill) is attached to the zone cell (grey fill) to form a patch. Note that in Fig. 10c, a striped cell can be attached to either grey cell.
However, when the area contains only one zone cell, although there are three potential patches, only one of these is selected (Fig. 10a). The reason is that if two patches, each having a single zone cell, are selected, the number of pairings is not reduced; on the other hand, if the patch with three zone cells is selected, there is no need to select other patches with a single zone cell. Therefore, for each area, we construct patch groups that include all potential patches such that no more than one patch can be selected from that group. For example, in Fig. 10a, there is only one patch group containing all three patches; in Fig. 10b and d, there is only one patch group containing one patch; in Fig. 10c, there are two patch groups, each containing one patch.
The GetPatchGroupsInsideArea routine (Algorithm 6) shows the details of constructing patches and patch groups. The algorithm handles separately each case based on the number of zone cells in the area. For a single zone cell (line 3), similar to the example in Fig. 10a, one patch group is constructed which includes two patches: one with one non-zone cell and another with all three non-zone cells. If there are two zone cells (line 7), the algorithm further considers if those two zone cells are adjacent or opposite (similar to Fig. 10b and c, respectively) and either one or two patch groups are created, corresponding to the two situations. Finally, when there are three zone cells (line 17), a single patch group is created.
At the end of Algorithm 6, each patch has its cells numbered from set {0, 1, 2, 3}. In order to recover the original cell ids (i.e., the coordinates in current level k of hierarchy), we use Algorithm 7, which takes as inputs a cell id i ∈ [0, 3] and the x, y values of the area \(R^{k}_{[x, x + 1] \times [y, y + 1]}\), and utilizes the spiral index to recover the original values.
Next, we need to evaluate which patches are more desirable to use in the enlarged zone, by computing the local cost and gain for each patch. Algorithm 8 takes as inputs a candidate patch and the grid dimension dk at current level k. It outputs as cost the number of attached cells (i.e., non-zone cells) of the patch (line 3). Effectively, the cost measures the amount of enlargement of the expanded alert zone caused by this patch. The gain measures the amount of saved computation: specifically, the number of search token non-wildcards that are eliminated when we combine the attached cells with the attaching cells for the current patch. There are two cases to consider when determining the gain of the patch: (i) when only one cell is attached to form a 1 × 2 patch (line 4), we can remove one non-wildcard element (line 5); (ii) when the entire 2 × 2 area is filled (line 6), the number of zone cells inside the area (i.e., n1) is further considered to determine the gain. Specifically, when n1 = 3, the gain is larger (2 × k) since we can remove a token in its entirety.
In the previous example from Fig. 8a, there are three patch groups corresponding to three areas: G1 for area \(R^{0}_{[4, 5] \times [0, 1]}\), G2 for area \(R^{0}_{[6, 7] \times [2, 3]}\), and G3 for area \(R^{0}_{[4, 5] \times [4, 5]}\). The patches in each patch group along with their cost, gain, attaching cells, and attached cells are shown in Table 1. For instance, to express area \(R^{0}_{[4, 5] \times [0, 1]}\) one can look at patches p1 of G1, and use two tokens “00*110” and “00111*”, with a total of 10 non-wildcard elements. By adding one cell, only a single token “00*11*” is needed to represent the area. Thus, the number of non-wildcard elements is reduced from 10 to 4, or an improvement of 6. The high gain when applying patch p1 results not only from the number of non-wildcards reduced in one token, but also from the reduction in the number of tokens (as one of the initial tokens is completely eliminated).
Table 1 Example of candidate patch groups for expanding the alert zone at level k = 0 Patch selection
Once we have the set of patches and patch groups, as well as their respective costs and gains, we need a method to select the actual patches to expand the current alert zone. Algorithm 9 outlines the patch selection process, which takes as inputs budget W, the grid dimension at current level \(d_{k}=\frac {d}{2^{k}}\), and current alert zone Ak. It outputs a set of patches PSet that has total cost at most W and maximizes the gain compared to other candidate patches.
The selection algorithm works within an expanding search boundary determined by the call to routine FindExpandingBoundary in line 1 (FindExpandingBoundary is summarized in Algorithm 10: the boundaries consist of the maximum and minimum coordinate values of zone cells, and they always have even values). Then, for each 2 × 2 area starting with even values (lines [3–6] in Algorithm 9), if the current 2 × 2 area is a valid area to expand (line 8), the patches and patch groups within this area are constructed (according to the procedure detailed in Section 5.1). First, the cells that already belong to the current zone are marked by calling Algorithm 5 (line 9). Then, using the marking information, the set \(\mathcal {G}_{\mathit {inside}}\) of patch groups within that area is constructed by calling Algorithm 6 (line 10). Next, for each patch in the patch groups of \(\mathcal {G}_{\mathit {inside}}\), the original coordinate ids of cells in the attaching and attached set of that patch are recovered by calling Algorithm 7 (line 13), and the local cost and gain are calculated by calling Algorithm 8 (line 14). Finally, a set of patches PSet is selected for expansion by calling Algorithm 11 (line 16).
The patches are selected such that the total cost is no more than W, the total gain is maximized, and there is no more than one patch selected per group. This can be modeled as a variant of a multiple-choice knapsack problem (MCKP) where a class in MCKP is represented by a patch group, and we may choose a single item from a class, instead of being required to choose at least one item. The reduction is as follows: Given an instance of MCKP with capacity W, m classes, and each item j in class j having cost ci, j and gain gi, j, for each class i, a new item \(j^{\prime }\) (or patch in our setting) is added with cost \(c_{i,j^{\prime }} = 0\) and gain \(g_{i, j^{\prime }} = 0\). However, our patch selection problem is not NP-hard, because W is restricted to a fraction of the alert zone, which in turn is restricted to a fraction of the entire grid.
In our setting, each patch group contain either one or two patches. As a result, a dynamic programming approach for traditional binary knapsack problem can be used. Algorithm 11 shows the dynamic programming solution that returns the selected patches for expansion. In the example summarized in Fig. 8 and Table 1, patches p1, p4, p5 are selected for expansion at level k = 0.
Complexity analysis
The complexity of the alert zone expansion (Algorithm 4) depends on the complexity of the binary minimization step (line 7) in which the algorithm decides whether or not to continue expansion. In the worst case, Algorithm 4 needs to expand through all \(\log d\) levels of the hierarchy, and in each level its invokes Algorithm 9 and the binary minimization procedure (in our implementation, we use the Espresso tool [32]).
Since Algorithm 9 finds the patch groups within the boundary of the query, and the size of the alert zone is often much smaller than the size of the data domain, we formulate the complexity of Algorithm 9 based on the alert zone size. Let Pk = |Ak| be the number of cells of the alert zone at level k. After finding patch groups, Algorithm 9 invokes the dynamic programming solution in Algorithm 11 to select patches. In the worst case, the number of patch groups Ng at level k equals the number of cells Pk of the alert zone. In our setting, there are only one or two patches in each patch group. Hence, the complexity of Algorithm 11 becomes \(\mathcal {O}(N_{g} W) = \mathcal {O}(\alpha {P_{k}^{2}})\) since W = αPk. Thus, the complexity of Algorithm 9 is \(\mathcal {O}(P_{k} + \alpha {P_{k}^{2}})\).
However, since the value of Pk is divided by a factor of 4 each time k increases, the complexity of the alert zone expansion (Algorithm 4) becomes \(\mathcal {O}(P_{0} + \alpha {P_{0}^{2}} + T_{\mathit {Es}}((1 + \alpha ) P_{0}) \log d)\) where P0 is the size of the zone at the base level (i.e., original grid) and TEs(t) is the time to run the binary minimization procedure for t inputs.