Trip planning queries with location privacy in spatial databases

Abstract

Privacy has become a major concern for the users of location-based services (LBSs) and researchers have focused on protecting user privacy for different location-based queries. In this paper, we propose techniques to protect location privacy of users for trip planning (TP) queries, a novel type of query in spatial databases. A TP query enables a user to plan a trip with the minimum travel distance, where the trip starts from a source location, goes through a sequence of points of interest (POIs) (e.g., restaurant, shopping center), and ends at a destination location. Due to privacy concerns, users may not wish to disclose their exact locations to the location-based service provider (LSP). In this paper, we present the first comprehensive solution for processing TP queries without disclosing a user’s actual source and destination locations to the LSP. Our system protects the user’s privacy by sending either a false location or a cloaked location of the user to the LSP but provides exact results of the TP queries. We develop a novel technique to refine the search space as an elliptical region using geometric properties, which is the key idea behind the efficiency of our algorithms. To further reduce the processing overhead while computing a trip from a large POI database, we present an approximation algorithm for privacy preserving TP queries. Extensive experiments show that the proposed algorithms evaluate TP queries in real time with the desired level of location privacy.

This is a preview of subscription content, access via your institution.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15

References

  1. 1.

    Ahmadi, E., Nascimento, M.A.: A mixed breadth-depth first search strategy for sequenced group trip planning queries. In: MDM, pp 24–33 (2015)

  2. 2.

    California dataset: http://www.cs.utah.edu/_lifeifei/spatialdataset.htm.

  3. 3.

    Cao, X., Chen, L., Cong, G., Xiao, X.: Keyword-aware optimal route search. PVLDB 5(11), 1136–1147 (2012)

    Google Scholar 

  4. 4.

    Chen, H., Ku, W., Sun, M., Zimmermann, R.: The multi-rule partial sequenced route query. In: SIGSpatial, p 10 (2008)

  5. 5.

    Chow, C., Mokbel, M.F., Aref, W.G.: Casper*: query processing for location services without compromising privacy. ACM Trans. Database Syst. 34(4) (2009)

  6. 6.

    Duckham, M., Kulik, L.: A formal model of obfuscation and negotiation for location privacy. In: Pervasive, pp 152–170 (2005)

  7. 7.

    Ghinita, G.: Private queries and trajectory anonymization: a dual perspective on location privacy. Trans. Data Privacy 2(1), 3–19 (2009)

    MathSciNet  Google Scholar 

  8. 8.

    Ghinita, G., Kalnis, P., Khoshgozaran, A., Shahabi, C., Tan, K.-L.: Private queries in location based services: anonymizers are not necessary. In: SIGMOD, pp 121–132 (2008)

  9. 9.

    Gruteser, M., Grunwald, D.: Anonymous usage of location-based services through spatial and temporal cloaking. Commun. ACM, 31–42 (2003)

  10. 10.

    Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: SIGMOD, pp 47–57 (1984)

  11. 11.

    Hashem, T., Barua, S., Ali, M.E., Kulik, L., Tanin, E.: Efficient computation of trips with friends and families. In: MDM, pp 931–940 (2015)

  12. 12.

    Hashem, T., Hashem, T., Ali, M.E., Kulik, L.: Group trip planning queries in spatial databases (2013)

  13. 13.

    Hashem, T., Kulik, L: “Don’t trust anyone”: privacy protection for location-based services. Pervasive Mob. Comput. 7, 44–59 (2011)

    Article  Google Scholar 

  14. 14.

    Hashem, T., Kulik, L.: Safeguarding location privacy in wireless ad-hoc networks. In: Ubicomp, pp 372–390 (2007)

  15. 15.

    Hashem, T., Kulik, L., Zhang, R.: Privacy preserving group nearest neighbor queries. In: EDBT, pp 489–500 (2010)

  16. 16.

    Hashem, T., Kulik, L., Zhang, R.: Countering overlapping rectangle privacy attack for moving knn queries. Inf Syst. 38(3), 430–453 (2013)

    Article  Google Scholar 

  17. 17.

    Hjaltason, G.R., Samet, H.: Ranking in spatial databases. In: SSD, pp 83–95 (1995)

  18. 18.

    Hjaltason, G.R., Samet, H.: Distance browsing in spatial databases. ACM TODS 24(2), 265–318 (1999)

    Article  Google Scholar 

  19. 19.

    Hu, H., Lee, D.L.: Range nearest-neighbor query. IEEE TKDE 18(1), 78–91 (2006)

    Google Scholar 

  20. 20.

    Hu, H., Xu, J.: Non-exposure location anonymity. In: ICDE, pp 1120–1131 (2009)

  21. 21.

    Indyk, P., Woodruff, D.: Polylogarithmic private approximations and efficient matching. IEEE Pervasive Comput., 245–264 (2006)

  22. 22.

    Khoshgozaran, A., Shahabi, C.: Blind evaluation of nearest neighbor queries using space transformation to preserve location privacy. In: SSTD, pp 239–257 (2007)

  23. 23.

    Kido, H., Yanagisawa, Y., Satoh, T.: An anonymous communication technique using dummies for location-based services. In: ICPS, pp 88–97 (2005)

  24. 24.

    Li, F., Cheng, D., Hadjieleftheriou, M., Kollios, G., Teng, S.: On trip planning queries in spatial databases. In: SSTD, pp 273–290 (2005)

  25. 25.

    Li, Y., Yang, W., Dan, W., Xie, Z.: Keyword-aware dominant route search for various user preferences. In: DASFAA, pp 207–222 (2015)

  26. 26.

    Microsoft. Location & privacy: Where are we headed? 2011 (accessed September 2, 2011). http://www.microsoft.com/privacy/dpd

  27. 27.

    Mokbel, M.F., Chow, C.-Y., Aref, W.G.: The new casper: query processing for location services without compromising privacy. In: VLDB, pp 763–774 (2006)

  28. 28.

    Nutanong, S., Zhang, R., Tanin, E., Kulik, L.: The v*-diagram: a query-dependent approach to moving KNN queries. PVLDB 1(1), 1095–1106 (2008)

    Google Scholar 

  29. 29.

    Ohsawa, Y., Htoo, H., Sonehara, N., Sakauchi, M.: Sequenced route query in road network distance based on incremental euclidean restriction. In: DEXA, pp 484–491 (2012)

  30. 30.

    Papadopoulos, S., Bakiras, S., Papadias, D.: Nearest neighbor search with strong location privacy. PVLDB 3(1), 619–629 (2010)

    Google Scholar 

  31. 31.

    Samrose, S., Hashem, T., Barua, S., Ali, M. E., Uddin, M. H., Mahmud, M. I.: Efficient computation of group optimal sequenced routes in road networks. In: MDM, pp 122–127 (2015)

  32. 32.

    Shang, S., Ding, R., Yuan, B., Xie, K., Zheng, K., Kalnis, P.: User oriented trajectory search for trip recommendation. In: EDBT, pp 156–167 (2012)

  33. 33.

    Shang, S., Ding, R., Zheng, K., Jensen, C.S., Kalnis, P., Zhou, X.: Personalized trajectory matching in spatial networks. VLDB J. 23(3), 449–468 (2014)

    Article  Google Scholar 

  34. 34.

    Shang, S., Liu, J., Zheng, K., Lu, H., Pedersen, T.B., Wen, J.: Planning unobstructed paths in traffic-aware spatial networks. GeoInformatica 19(4), 723–746 (2015)

    Article  Google Scholar 

  35. 35.

    Sharifzadeh, M., Kolahdouzan, M., Shahabi, C.: The optimal sequenced route query. VLDB J. 17(4), 765–787 (2008)

    Article  Google Scholar 

  36. 36.

    Wang, S., Lin, W., Yang, Y., Xiao, X., Zhou, S.: Efficient route planning on public transportation networks: a labelling approach. In: SIGMOD, pp 967–982 (2015)

  37. 37.

    Yiu, M.L., Jensen, C.S., Huang, X., Lu., H.: Spacetwist Managing the trade-offs among location privacy, query performance, and query accuracy in mobile services. In: ICDE, pp 366–375 (2008)

  38. 38.

    Yiu, M.L., Jensen, C.S., Møller, J., Lu, H.: Design and analysis of a ranking approach to private location-based services. ACM TODS 36(2), 10 (2011)

    Article  Google Scholar 

  39. 39.

    Zhu, A.D., Ma, H., Xiao, X., Luo, S., Tang, Y., Zhou, S.: Shortest path and distance queries on road networks: towards bridging theory and practice. In: SIGMOD, pp 857–868 (2013)

  40. 40.

    Zhu, A. D., Xiao, X., Wang, S., Lin, W.: Efficient single-source shortest path and distance queries on large graphs. In: SIGKDD, pp 998–1006 (2013)

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Tanzima Hashem.

Appendices

Appendix A: Detailed algorithm: false location

In this section, we present the detailed algorithm to evaluate a PkTP query based on false location. Algorithm 1, PkTP _false, shows the pseudocode to process a PKTP query that runs on a user’s mobile device. Inputs of the algorithm are k, required types \(\{1, 2, \dots , m\}\), a user’s actual source s and destination d, and the obfuscation level o l . The output is \(R_{s}= \{{p_{1}^{1}}, {p_{2}^{1}}, \dots , {p_{m}^{1}}\},\{{p_{1}^{2}}, {p_{2}^{2}}, \dots , {p_{m}^{2}}\},\dots ,\{{p_{1}^{k}}, {p_{2}^{k}}, \dots , {p_{m}^{k}}\}\), k sets of POIs that have the k smallest trip distances from s to d.

The notations that we have used for this algorithm are as follows:

  • M i n D[1..k]: An array of k entries, where M i n D[j] represents the j th smallest trip distance from s to d via visiting POIs of required types, where 1 ≤ jk. M i n D[j] is updated with the incremental retrieval of POIs from the LSP. In addition, M i n D[k] represents the length of the major axis of the search region (ellipse).

  • r f : The distance between f and the farthest retrieved POI from f. It also represents the radius of the known region (circle) centering at f.

  • IsInsideKnownRegion(s, d, M i n D[k], f, r f ): A function that checks whether the known region(circle) covers the search region(ellipse) or not and returns yes or no, respectively.

  • \(P^{\prime }\): A set that stores all POIs retrieved from the LSP, which fall inside the search region.

figurea

The first step of our algorithm is the computation of the false location f based on the source s and destination d of a user. Any irreversible technique to compute the false location can be used for this purpose. In our implementation, we have used the following technique to compute the false location. At first an ellipsoid area is calculated with two foci at s and d respectively. The length of the major axis is randomly selected from a range that can vary from the distance between s and d to the maximum distance between two points in the total space, which ensures that four extreme points of the ellipse, remain within the total space. Then a false location is chosen randomly on the boundary of the ellipsoid area. Intuitively, if s, d and f are close then POIs are retrieved in low cost. To reduce query processing cost, the length of the major axis can be also set by the user instead of computing randomly. The output array R s is initialized with {}, the entries of distance array M i n D are initialized with infinity, and r f is initialized with 0.

Next, a query is sent to the LSP using a function INN with the parameters f, k, m and f l a g. INN incrementally retrieves nearest POIs from f. Any existing nearest neighbor algorithm [17] can be used for the function INN. For the first time f l a g = 0 and INN returns at least k POIs of one type and at least one POI of remaining types, which are sufficient to get k initial POI sets. For rest of the time flag will be 1 and INN returns k POIs, where POIs can be of any of the required types \(1,2,\dots ,m\). In Line 8, the retrieved POIs with respect to f are stored in P. Each time P contains the latest POIs retrieved from the LSP. For a dense distribution of POIs, the known region may expand slowly and require a large number of communications between the user and the LSP. To avoid such a scenario, the user may adjust the number of POIs that need to be retrieved incrementally as k + δ instead of k in Function INN, where δ is a positive integer. The parameter r f is updated as the distance between f and the farthest POI from f in P.

After retrieving POIs from the LSP, Function IsInsideSearchRegion checks and prunes the newly retrieved POI in P, if any POI falls outside the overlapping region of the known region and the search region. For the first time, no POI is pruned as the search region has not been yet computed, i.e., M i n D[k] is \(\infty \).

Then the algorithm computes S, all possible new eligible candidate sets of POIs computed with POIs in P and \(P^{\prime }\) using the function GenerateSet. Note that P and \(P^{\prime }\) store all POIs retrieved from the LSP in the current iteration and previous iterations, respectively, which fall inside the search region. In addition to computing S, the function GenerateSet also updates \(P^{\prime }\) by adding POIs in P to \(P^{\prime }\). The pseudocode for GenerateSet is shown in Algorithm 2 (please see Section Appendix A.1).

For each set \(S^{\prime } \in S\), the algorithm computes the trip distance \(Tdist(s,d,S^{\prime })\) and updates R s and M i n D, if \(Tdist(s,d,S^{\prime })< MinD[k]\). At this stage, IsInsideSearchRegion function again prunes the POIs of \(P^{\prime }\) because M i n D[k] may have been reduced, which means the area of the search region may have been reduced. A POI that was previously inside the search region may now go outside of the search region. At the end, the IsInsideKnownRegion function checks whether the known region(circle) covers the search region(ellipse) or not. If yes, the loop terminates, otherwise the loop continues and repeats the process by retrieving more POIs from the LSP. When the known region covers the search region, the k optimal POI sets have been found and stored in R s .

After that the algorithm checks whether the privacy level achieved so far is greater than or equal to o l using the function CompObLev. The incremental retrieval of POIs from the LSP continues until the user specified privacy level o l is achieved by expanding the known region. To compute the obfuscation level, CompObLev needs to check whether a source destination pair within the known region satisfies the terminating condition of the search and CompObLev needs to repeat the test for all possible source destination pairs within the known region, which is computationally very expensive. Thus, to make the process faster, in our proposed approach, a user approximates her obfuscation level with monte carlo simulation. For this, we randomly generate 1 million source destination pairs within the known region and compute the percentage of source destination pairs that satisfy the termination condition, and thereby approximate the area of known region that can be considered by adversaries as a refined location of a user. Finally, we compute the obfuscation level as the percentage of the area considered as the user’s refined location with respect to the total space.

A.1 GenerateSet

figureb

The GenerateSet function takes the parameters m, P, \(P^{\prime }\), M i n D[k] and flag and returns all possible eligible candidate sets of POIs from the retrieved dataset. Note that for sequenced kTP queries, the algorithm considers the sequence while generating the sets and if the sequence is not fixed, the algorithm considers all possible sequences of visiting POIs while generating the sets. Algorithm 2 describes the detailed steps of the function.

For the first time f l a g = 0 and \(P^{\prime }\) is empty. Thus, sets are generated using the POIs in P with the function GetSet, stored in S and P is copied in \(P^{\prime }\).

For subsequent call of the algorithm, f l a g = 1 and P contains the latest POIs retrieved from the LSP and \(P^{\prime }\) contains the previously retrieved POIs. For each POI pP, all possible sets are generated with p and other POIs in \(P^{\prime }\) with the function GetSet, sets are added in S, and p is added in \(P^{\prime }\).

After generating the sets, OneSetDist calculates the distance for each set of POIs. For example, if a set is {p 1, p 2, p 3}, the distance is computed as d i s t(p 1, p 2) + d i s t(p 2, p 3) The distances of eligible sets are also stored to avoid recomputation for future trip distance computation (not shown in Algorithm 2).

For better understanding, Table 3 epresents a simulation of function GenerateSet with an example scenario for k = 1 and m = 2. The table is divided into two parts based on values of flag (0 or 1). First time when GenerateSet is called, M i n D[k] is \(\infty \) and flag is 0. The first part of the table assumes \(P=\{p_{1},p_{2},p_{2}^{\prime }\}\). After execution of GetSet, S includes sets \(\{(p_{1},p_{2}),(p_{1},p_{2}^{\prime })\}\) (as shown in Line 3 of Table 2). Then \(P^{\prime }\) keeps a copy of P for future use. In Line 12.(a), for set {(p 1, p 2)}, calculated distance from OneSetDist is less than M i n D[k] because M i n D[k] is infinity. Thus, the condition of Line 13.(a) is False. Same for Lines 12.(b) and 13.(b). Hence no set will not be removed from S and it contains \(\{(p_{1},p_{2}),(p_{1},p_{2}^{\prime })\}\).

Table 3 GenerateSet: simulation with example

We assume that the trip distance via (p 1, p 2) and \((p_{1},p_{2}^{\prime })\) are 10 and 20, respectively. Thus, the next time when GenerateSet is called M i n D[k] is 20 and flag is 1. The second part of the table assumes \(P=\{p_{1}^{\prime },p_{2}^{\prime \prime }\}\). In Line 7.(a), for \(p=p_{1}^{\prime }\), two sets \(\{(p_{1}^{\prime },p_{2}),(p_{1}^{\prime },p_{2}^{\prime \prime })\}\) are generated and stored in S. After that \(p_{1}{\prime }\) is added to the \(P^{\prime }\) which stores all the retrieved POIs. Following the same procedure for \(p_{2}^{\prime \prime }\), S contains sets \(\{(p_{1}^{\prime },p_{2}),(p_{1}^{\prime },p_{2}^{\prime \prime }), (p_{1},p_{2}^{\prime \prime }),(p_{1}^{\prime },p_{2}^{\prime \prime })\}\) and all the POIs are added to \(P^{\prime }\). Assume that for set \(\{(p_{1}^{\prime },p_{2})\}\) in 13.(a), OneSetDist distance, say 30, is greater then M i n D[k]. Thus, the condition is true and this set will be removed from S. Present S contains \(\{(p_{1}^{\prime },p_{2}^{\prime \prime }), (p_{1},p_{2}^{\prime \prime }),(p_{1}^{\prime },p_{2}^{\prime \prime })\}\) in Line 14.(a). Similar process continues for each remaining sets in S. After pruning all the unnecessary sets, finally S contains \(\{(p_{1},p_{2}^{\prime \prime })\}\).

Appendix B: Detailed algorithm: cloaked location

In this section, we present our algorithm to evaluate a PkTP query based on the cloaked location of the user. Algorithm 3, PkTP _cloaked_User, shows the pseudocode to process a PKTP query that runs on a user’s mobile device. Similar to Algorithm 1, inputs of the algorithm are k, required types \(\{1, 2, \dots , m\}\), a user’s actual source s and destination d, and the obfuscation level o l . The output is \(R_{s}= \{{p_{1}^{1}}, {p_{2}^{1}}, \dots , {p_{m}^{1}}\},\{{p_{1}^{2}}, {p_{2}^{2}}, \dots , {p_{m}^{2}}\},\dots ,\{{p_{1}^{k}}, {p_{2}^{k}}, \dots , {p_{m}^{k}}\}\), k sets of POIs that have the k smallest trip distances from s to d.

figurec

In the first step of the algorithm, the function GenerateRectangle generates the source and destination rectangle s r and d r based on the user defined obfuscation level o l . We use the algorithm proposed in [13, 14] to randomly compute the rectangles according to the privacy requirements. Then the algorithm retrieves a candidate answer set that includes optimal answers for all possible source-destination pairs in s r and d r , respectively, using the function PkTP _cloaked_LSP (discussed in detail in the later part of this section). The candidate POIs are stored in P.

In Line 5, the function ComputeSet is called to generate all possible sets with POIs in P and the sets are stored in S. After that for each set \(s^{\prime }\) in S, the algorithm computes the trip distance with respect to s to d. If the trip distance is less than M i n D[k] then the answer set R s and the distance array M i n D are updated. Since M i n D[k] may have been updated, the algorithm checks whether it is possible to prune some sets from S to reduce the computational overhead using the function PruneSet. The function PruneSet removes a set \(\{{p^{1}_{1}}, {p^{2}_{2}},...,{p^{i}_{m}}\}\) from S if \({\sum }_{i=1}^{m-1}{dist(p_{i}, p_{i+1})}\) is already greater than M i n D[k]. Finally R s contains k sets of POIs that minimize trip distances from s to d.

Algorithm 4, PkTP _cloaked_LSP shows the details steps for LSP side algorithm. A set \(P^{\prime }\), initialized with , stores the POIs that provide k smallest trip distances with respect to all possible source-destination pairs in s r and d r , respectively. The input to the algorithm are \(s_{r},d_{r},k,\{1, 2, \dots , m\}\) and the output is \(P^{\prime }\).

figured

The notations that we have used for this algorithm are summarized below:

  • s c (d c ): The center of the source (destination) rectangle, s r (d r ) and one of the foci of the search region (ellipse).

  • m p : The mid point of s c and d c . It is also the center point of the known region (circle).

  • d 1(d 2): The Euclidean distance from s c to the corner point of the s r

  • M i n D[1..k]: An array of k entries, where M i n D[j] represents the j th smallest trip distance from s c to d c via visiting POIs of required types, where 1 ≤ jk.

  • \(d^{\prime }\): The sum of d 1, d 2 and M i n D[k]. It also represents the length of the major axis of the search region (ellipse).

  • \(r_{m_{p}}\): The radius of the known region centering at m p .

  • IsInsideKnownRegion(\(s_{c},d_{c},d^{\prime },m_{p},r_{m_{p}}\)): A function that checks whether the known region(circle) covers the search region(ellipse) or not and return yes or no, respectively.

Similar to Algorithm 1, this algorithm uses a function INN with parameters f, k, m p and flag to incrementally retrieve nearest POIs with respect to m p . For the first time f l a g = 0 and INN returns at least k POIs of one type and at least one POI of remaining types, which are sufficient to get k initial POI sets. For rest of the time flag is 1 and INN returns k POIs, where POIs can be of any of the required types \(1,2,\dots ,m\). The retrieved POIs with respect to m p are stored in P. Each time P contains the latest POIs retrieved by INN. Note that with every call of INN, the length of the radius of the known region \(r_{m_{p}}\) increases. Function IsInsideSearchRegion in Line 8 prunes a POI in P if it falls outside the overlapping region of the known region and the search region. Note that initially M i n D[k] is \(\infty \) and thus, no POI from P is pruned in the first iteration.

Algorithm 4 also uses the function GenerateSet (described in Section A.1) to generate possible candidate sets and stores them in S. For each set \(s^{\prime }\) in S, the algorithm calculates the trip distance \(Tdist(s_{c},d_{c},S^{\prime })\) for source destination pair s c and d c . If \(Tdist(s_{c},d_{c},S^{\prime })< MinD[k]\), the array M i n D is updated. In Line 15, the algorithm updates the length of the major axis \(d^{\prime }\) of the search region and uses IsInsideSearchRegion function to prune the POIs from \(P^{\prime }\) that are not included in the overlapping region of computed known and search regions.

Finally, the algorithm checks whether the current known region covers the search region using the function IsInsideKnownRegion. If yes, the algorithm returns \(P^{\prime }\) to PkTP _cloaked_User. Otherwise, the algorithm repeats the process to identify the candidate answer set that includes POIs for k smallest trip distances with respect to all possible source-destination pairs in s r and d r , respectively.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Soma, S.C., Hashem, T., Cheema, M.A. et al. Trip planning queries with location privacy in spatial databases. World Wide Web 20, 205–236 (2017). https://doi.org/10.1007/s11280-016-0384-2

Download citation

Keywords

  • Location-based services
  • Privacy
  • Trip planning queries