Keywords

1 Introduction

The widespread use of mobile devices in recent years opened the doors to the growth of the location-based services (LBSs) market. LBSs provide customized services for users based on their current locations. A user sends the request to the LBS server either as a single request (snapshot query), or as a continuous query which will be evaluated continuously over a given interval. Directions and navigation applications, weather applications, venue finders, social games, and crowd-sensing applications are examples of LBSs [1].

Providing such context-aware services requires the users to share their private location data. Sharing a user’s location data means revealing the user existence at specific time and location to the service provider. Furthermore, using geographic maps and background knowledge to analyze the shared accurate locations of the user allows one to infer sensitive information about his/her life style, home and work addresses, and daily activities [2]. Even if it is assumed that service providers have no intention to misuse the collected locations data, their servers are still prone to the threat of malicious attacks to access the valuable data. According to [1], location privacy field has four open challenges: quantifying location privacy, the need for new protection mechanisms, the lack of large datasets, and the need to raise user awareness about over-sharing.

To preserve location privacy, many location privacy preserving mechanisms (LPPMs) have been developed which reduce or blur the user’s location resolution prior to sending it to the service provider [1,2,3]. The caveat is that said blurring will, in fact, reduce query accuracy and may lower the utility level depending on the characteristics of the requested service. For example, sending a modified version of the user’s location, ten kilometers farther from his/her accurate location, still provides the weather forecast service with an acceptable utility level, while using the same modified location version for a taxi booking service reduces its utility level significantly. While these mechanisms have demonstrated effective performance with snapshot queries, this work shows that preserving location privacy for continuous queries should be addressed differently.

In this paper, we aim to answer the following question: how to preserve location privacy for continuous queries? To accomplish that, we propose MOPROPLS framework. First, a novel set of requirements is proposed as a part of the framework, such that any LPPM should meet these requirements to provide location privacy for continuous queries. Second, a novel location privacy leakage metric is proposed to measure location privacy leakage of continuous queries. Third, a novel two-phased probabilistic candidate selection algorithm is proposed to preserve privacy for continuous queries, considering the correlation between the obfuscated locations. Comparing MOPROPLS framework with the geo-indistinguishability LPPM in terms of privacy (adversary estimation error) shows that the average of MOPROPLS framework improvement is 34%.

The rest of the paper is organized as follows. In Sect. 2, we present the related work. The proposed set of requirements are discussed in Sect. 3. In Sect. 4, we discuss the proposed framework. Section 5 presents the framework evaluation. Finally, Sect. 6 concludes the paper.

2 Related Work

To preserve the user’s location privacy, many LPPMs have been proposed. These mechanisms reduce or blur the resolution of the user’s location shared with the service provider [1,2,3]. Generally, LPPMs are classified as k-anonymity, obfuscation, mix zones, and cryptographic-based schemes.

K-anonymity is a concept which addresses the privacy issue in different domains. It depends on creating a set of k objects which are indistinguishable from each other. In the location privacy domain, k-anonymity is implemented by creating cloaking regions as a geographic area containing k users [4,5,6,7,8]. The accurate location in the user’s query is then replaced by the cloaking region. Although generating effective cloaking regions for snapshots queries is feasible, reducing the cost of the continuous cloaking region extension process to resist potential attacks for continuous queries requires utilizing knowledge about the user movements which cannot be available in all cases [9, 10].

In addition to cloaking regions, position dummies are used to achieve the k-anonymity [11,12,13,14]. The user’s request includes the accurate location of the user and k-1 faked locations (i.e. position dummies). In addition to the overhead resulted from using additional faked locations, the mechanisms of position dummies suffer from the potential threat of utilizing the geographic maps and background knowledge to identify the faked locations and infer user’s accurate location.

Mix zone is another principle which is used to preserve location privacy [15,16,17,18,19,20,21,22]. In this principle, a set of zones are defined, such that the user suspends sending queries to the LBS provider when he/she is located inside them. Entering multiple users to a mix zone means suspending the process of sending the queries to the service provider by all users located inside that mix zone. Once the user leaves the mix zone, he/she is allowed to postpone sending the queries to the service provider. Entering the mix zone at the same time interval leads to mixing the identities of those users. The drawback of using mix zones is the extensive analysis of user’s motion patterns and roads network characteristics required to construct effective mix zones [23]. In addition, blocking the queries inside the mix zone limits the service usage and has a negative impact on it.

Many proposed mechanisms use cryptographic tools to preserve location privacy. These mechanisms are proposed for specific types of LBSs. In [24], symmetric key cryptography is used to preserve location privacy for a nearby-friends finding service. In [25], BMobishare uses Bloom Filters to preserve the location privacy for the social location-enabled services. Private Information Retrieval is a cryptographic-based protocol, which is proposed to be used for querying Point of Interest (PoI) database without allowing the database server to identify the targeted data item [26, 27]. The drawback of cryptographic-based schemes is their limitation to support only particular kinds of LBS. Moreover, the high computation cost of these mechanisms limit their usage.

The aim of obfuscation-based LPPMs is to preserve location privacy by reducing the user’s location accuracy before sharing it with the service provider [3, 28,29,30,31,32,33,34]. Using coordinate system transformations, caching, and generating obfuscation areas are examples of these mechanisms. In [35], a LPPM is proposed which takes into consideration the capabilities and knowledge of the adversary. The user profile is proposed to represent the adversary’s knowledge as the probability distribution of the user to access the service from a definite location. A differential privacy based mechanism is proposed for location datasets in [36]. In [37], geo-indistinguishability is proposed to apply the concept of differential privacy for location privacy field. The proposed mechanism adds random noise to the user’s location using Planar Laplacian distribution. In [38], an elastic distinguishability metric is introduced to handle the deviation in the density level of different areas and adjust the noise accordingly to provide the same level of privacy everywhere. For continuous queries, obfuscation-based mechanisms are vulnerable to the attack of correlation analysis. The noise-based obfuscation is also vulnerable to the attack of correlation analysis [39]. For area-based obfuscation, analyzing the intersection between the areas can allow an adversary to infer the accurate location of the user [30].

3 Proposed Set of Requirements

Although using the proposed LPPMs in Sect. 2 for snapshot queries is straightforward, applying these LPPMs for continuous queries is more complicated. In short, providing location privacy for a continuous query, which implies updating the service provider continuously about the user’s location, introduces a higher level of risk for the user’s location privacy, as compared to a snapshot query which implies sharing the user location for one time only. By using a LPPM, the user’s reported location represents a modified version of his/her accurate location with reduced accuracy. To get a high location privacy level, the user can reduce the location accuracy significantly. Unfortunately, this is not applicable in all cases as a result of the potential loss of service utility. In reality, the user must choose a suitable reduction level, such that the query result is still useful when the service provider uses the reported location. Based on the service type and the reported location, the adversary can infer the potential area where the user’s real location is located within. In the perfect situation, the location privacy is preserved by hiding the accurate location throughout the entire protection area. That means the adversary cannot use the prior knowledge to reduce or shrink this area by eliminating some of its sub-areas. Therefore, the adversary’s ability to eliminate a sub-area from the protection area is considered as a privacy leakage. We call this type of leakage as intra-privacy leakage.

In inter-privacy leakage, the adversary reduces the user’s protection area by analyzing the consequent protection areas produced by the user’s reported locations. This can be achieved by analyzing the current protection area using the previous protection area to eliminate some sub-areas from the current protection area. The assumption is the adversary can estimate the user’s traveled distance between the two consequent queries by measuring the elapsed time between the two queries and by estimating the user speed from the current region where the user is located within. Given the user’s traveled distance, a sub-area is eliminated when it is unreachable from any of the sub-areas of the previous protection area. The sub-area is considered unreachable when the distance between this sub-area and all the sub-areas of the previous protection area exceeds the user’s traveled distance. As a result, the inter-privacy leakage leads to shrinking the protection area and then reducing the location privacy level.

In this work, we introduce a novel set of six requirements, which we believe any LPPM should meet in order to provide location privacy for continuous queries. These requirements are discussed in the following sub-sections.

(I) Independency from Third-Party Servers.

Some LPPMs depend on using a third-party server (a trusted server) to preserve the location privacy for mobile users. Although this can be acceptable for snapshot queries, which involve sending the user’s accurate location only once to the third-party server, continuous queries require sending the user’s accurate location to the third server frequently. If we assume that the third-party server is trusted, then receiving valuable data (the accurate locations of a large number of users) makes this server a target of attacks. In addition, there is no guarantee about the trust level of the third-party server. Therefore, we propose the independency from a third-party server as one of the requirements which should be met by any LPPM that targets the continuous queries, such that any query sent by the user’s mobile device contain a protected version of the user’s location.

(II) Ability to Provide Location Privacy based on Different Required Utility Levels.

LPPMs should provide location privacy with the capability to customize it according to the required utility level of the LBS. Consequently, instead of providing one location privacy level for all LBSs, meeting this requirement ensures that each service provider gets only the required accuracy level of the user’s location based on the service’s utility requirements and hence reduces any unnecessary location privacy leakage. Although this requirement is essential, many LPPMs lack the ability to meet this requirement. Therefore, this requirement is important for LPPMs which address continuous queries, as the privacy leakage in continuous queries is more than the snapshot queries.

(III) Ability to Process User’s Queries in All Cases.

Some LPPMs are designed such that they require to block the user’s queries in some situations to preserve the location privacy. This means the user must delay sending the query or simply drop the query in these situations. Although this blocking is essential in these LPPMs to preserve the location privacy, we think that any LPPM, which addresses the continuous queries, should be designed to avoid such situations. We think that blocking the user from using the LBS in some situations will affect the service usage negatively. Therefore, for continuous queries, such restriction on the service usage should be avoided.

(IV) Ability to Model and Proact All Sources of Location Privacy Leakage.

Although many LPPMs are designed to adapt to one or both of the location privacy leakage sources, the adaptation is not systematic as a result of the lack of a proper privacy leakage metric. Essentially, many location privacy metrics are already proposed, but these metrics are proposed with the aim to evaluate and compare the performance of the different LPPMs. So, without a proper privacy leakage metric designed explicitly to measure the privacy leakage produced by reporting a specific location point, it will be difficult to provide an efficient LPPM for continuous queries. Accordingly, this requirement highlights the need to propose a location privacy leakage metric which has the ability to measure the privacy leakage of any potential reported location. Then, this metric must be integrated to the LPPM to adapt its operations to the measured privacy leakage in order to preserve the user’s location privacy.

(V) The Output of the LPPM Should be a Geographic Point, Not an Area.

Many LPPMs preserve the user’s location privacy, where the output of the process is an area to be used instead of the accurate location of the user. Unfortunately, current LBS providers do not support this type of output, as they are developed to receive the user’s location as a geographic point. So, to ensure that any LPPM can be used without any change to the LBS market, it is required that the LPPM output is a geographic point.

(VI) Independency from Any Intensive Computation.

Mobile devices are well-known for their limitation in the computation and power capabilities. So, to propose an LPPM to run on such devices, intensive computations must be avoided. Many LPPM are proposed based on intensive computation procedures. Using public-key cryptosystems is an example of such intensive computation. Therefore, one of the requirements of any LPPM to preserve the location privacy of continuous queries which are expected to run for a long time in the user’s mobile device is its independency from any intensive computation.

Table 1 shows a list of all LPPMs discussed in Sect. 2 and to what extent each LPPM satisfy the requirements proposed in this work. The table shows that the LPPMs discussed in Sect. 2 lack the support of all the proposed requirements. Accordingly, this emphasizes the need for a new LPPM which meets all requirements.

Table 1. The lack to meet the proposed requirements by current LPPMs.

4 Proposed Framework

The previous section shows that each LPPM has its own drawback. Geo-indistinguishability is an obfuscation-based mechanism which meets most of the proposed requirements. Despite its efficiency and simplicity, it lacks the capability to model and proact the potential intra-privacy and inter-privacy leakage.

As a result, we aim to propose an LPPM framework, namely MOPROPLS, which has the capability to model and proact the potential intra-privacy and inter-privacy leakage. MOPROPLS is based on the geo-indistinguishability notion. Thus, MOPROPLS fulfills all the requirements discussed in the previous section.

4.1 The Need for a New Privacy Leakage Metric

In this work, we aim to address the location privacy leakage of continuous queries. To achieve that, we propose a new privacy leakage metric, which captures the intra-privacy leakage and inter-privacy leakage of continuous queries. In Sub-Sect. 4.2, we discuss the preliminaries of the proposed metric, while Sub-Sect. 4.3 introduces our proposed privacy leakage metric.

4.2 Preliminaries

Let \( {\mathcal{X}} \) be a set of user’s possible locations, and \( x^{t} \) represents the user’s location at time \( t \), \( x^{t} \in {\mathcal{X}} \). Let \( c_{gU} \left( a \right) \) denotes the user’s candidates generation function for an arbitrary location \( a \), where \( a \in {\mathcal{X}} \). \( c_{gAdv} \left( a \right) \) represents the adversary’s candidates generation function for an arbitrary location \( a \), where \( a \in {\mathcal{X}} \). \( c_{e} \left( a \right) \) denotes the candidate evaluation function for an arbitrary location \( a \), where \( a \in {\mathcal{X}} \). \( {\mathcal{C}}^{t} \) represents a set of all possible candidate locations for \( x^{t} \), such that \( {\mathcal{C}}^{t} = c_{gU} \left( {x^{t} } \right) \) and \( c_{i}^{t} \in {\mathcal{C}}^{t} , i = 1,2, \ldots , m \). \( z^{t} \) is defined as the user’s reported location at time t, such that \( z^{t} = c_{s} \left( {{\mathcal{C}}^{t} } \right) \), \( z^{t} \in {\mathcal{C}}^{t} \), where \( c_{s} \) denotes the candidate selection function. \( d \) represents the distance function. \( l \) denotes the obfuscation level parameter. \( plm \) denotes the privacy leakage metric. \( plm_{intra} \) is defined as the intra-privacy leakage function, while \( plm_{inter} \) is defined as the inter-privacy leakage function. The user profile is defined by \( prof_{exist} \left( a \right) \) as the user’s existence probability at an arbitrary location \( a \), and \( prof_{move} \left( {a,b} \right) \) is defined as the user’s movement probability from an arbitrary location \( a \) to an arbitrary location \( b \). \( {\dot{\mathcal{C}}}_{i}^{t} \) represents a set of all possible candidate locations for \( c_{i}^{t} \), such that \( {\dot{\mathcal{C}}}_{i}^{t} = c_{gAdv} \left( {c_{i}^{t} } \right) \) and \( \dot{c}_{i,j}^{t} \in {\dot{\mathcal{C}}}_{i}^{t} , j = 1,2, \ldots , n \). \( {\ddot{\mathcal{C}}}^{t - 1} \) represents a set of all possible candidate locations for \( z^{t - 1} \), such that \( {\ddot{\mathcal{C}}}^{t - 1} = c_{gAdv} \left( {z^{t - 1} } \right) \) and \( \ddot{c}_{k}^{t - 1} \in {\ddot{\mathcal{C}}}^{t - 1} , k = 1,2, \ldots , q \).

4.3 Proposed Privacy Leakage Metric

Given a user’s location \( x^{t} \), the privacy leakage metric of a user’s candidate location \( c_{i}^{t} \), where \( c_{i}^{t} \in c_{gU} \left( {x^{t} } \right) \) is defined as:

$$ plm\left( {c_{i}^{t} } \right) = \frac{1}{2}\left( {plm_{intra} \left( {c_{i}^{t} } \right) + plm_{inter} \left( {c_{i}^{t} } \right)} \right) $$
(1)

The intra-privacy leakage function, \( plm_{intra} \left( {c_{i}^{t} } \right) \) is defined as:

$$ plm_{intra} \left( {c_{i}^{t} } \right) = 1 - \frac{1}{n} \mathop \sum \limits_{j = 1}^{n} prof_{exist} \left( {\dot{c}_{i,j}^{t} } \right) $$
(2)

where \( \dot{c}_{i,j}^{t} \in c_{gAdv} \left( {c_{i}^{t} } \right) \).

The inter-privacy leakage function, \( plm_{inter} \left( {c_{i}^{t} } \right) \) is defined as:

$$ \begin{aligned} plm_{inter} \left( {c_{i}^{t} } \right) = 1 & - \frac{1}{n} \mathop \sum \limits_{j = 1}^{n} dig( \mathop \sum \limits_{k = 1}^{q} [1 : d\left( {\dot{c}_{i,j}^{t} , \ddot{c}_{k}^{t - 1} } \right) \\ & \le d\left( {x_{t} ,x_{t - 1} } \right) {\bigvee } 0 : d\left( {\dot{c}_{i,j}^{t} , \ddot{c}_{k}^{t - 1} } \right) \\ & > d\left( {x_{t} ,x_{t - 1} } \right) ] ) \\ \end{aligned} $$
(3)

where \( \dot{c}_{i,j}^{t} \in c_{gAdv} \left( {c_{i}^{t} } \right) \), \( \ddot{c}_{k}^{t - 1} \in c_{gAdv} \left( {z^{t - 1} } \right) \) and \( dig\left( v \right) = [1:v > 0 {\bigvee } 0 : v = 0] \)

4.4 MOPROPLS Framework

In this work, we propose a framework, namely MOdeling and PROacting to Privacy Leakage Sources (MOPROPLS). Our MOPROPLS framework has the capability to model and interact to all sources of privacy leakage using our proposed location privacy leakage metric. In addition, the framework includes a novel two-phased probabilistic candidate selection algorithm that takes into consideration the correlation between the obfuscated locations in order to preserve privacy for continuous queries. MOPROPLS consists of the following components: obfuscation candidates generation, obfuscation candidates evaluation, and two-phased probabilistic obfuscation candidate selection (Fig. 1).

Fig. 1.
figure 1

MOPROPOLS framework

5 Evaluation and Results

5.1 Framework Implementation

We extended the LPM2 tool [40] to implement our framework as a simulation tool. In addition to the MOPROPLS framework components, the tool has the experiments manager. It consists of dataset loading and parsing manager, user profile manager, and inference attacks manager.

5.2 Dataset

To evaluate the proposed framework, a real-world dataset is used to conduct the experiments. Epfl/mobility is a GPS trajectory dataset of more than 500 taxi cabs in San Francisco, USA [41]. The dataset represents the GPS trajectories of taxi cabs for more than thirty days. Twenty cabs were selected randomly from the dataset, where the trajectories of these users were used in the experiments.

5.3 Inference Attack

The prior knowledge of an adversary about an LBS user varies from no prior knowledge to full prior knowledge. In this paper, we focus on the extreme case, which is represented by an adversary who has full prior knowledge about the user. In this case, an adversary is assumed to have full knowledge about the user existence and transitions in the space, where the knowledge is formulated as a user profile. The user profile contains the probabilities of the user existence in any particular location x, and also the user’s transition probability from any location x to any location y. The user profile is constructed from the user’s real trajectories. Therefore, the aim of the inference attack is to estimate the user’s real location based on the received obfuscated locations and the known user profile.

5.4 Experimental Setup

The experiments were conducted on an Intel i5 (3.10 GHz) machine with 16 GB RAM and Ubuntu 17.04 OS. The simulation area size is 50 × 88 km2. Each conducted experiment was repeated ten times and the average of the ten measurements was taken to represent the experiment result. The results are discussed in the following section.

5.5 Experimental Scenarios and Results

To evaluate the performance of our framework, we compare the achieved location privacy level of MOPROPLS against the geo-indistinguishability mechanism. Five obfuscation levels were used in the experiments. The strong adversary is used to run the inference attack, where the strong adversary has a full profile for each mobile user. The full profile represents the full knowledge about the user existence and transitions in the space.

Scenario 1.

In scenario 1, the weak adversary attack and the strong adversary attack were used to run the inference attack against the geo-indistinguishability mechanism. The weak adversary attack uses partial prior knowledge which is the user’s existence probability, while the strong adversary attack uses the full prior knowledge available, which is the user’s existence probability and the user’s movement probability. The objective of this experiment is to show the effect of using the user’s movement probability in reducing the user’s location privacy. The experiment shows that the strong adversary exploits the user’s movement prior knowledge to reduce the location privacy level.

For obfuscation level 1, the average location privacy achieved against the weak adversary is approximately 0.469 with a 95% confidence interval between 0.45 and 0.487, while the average location privacy achieved against the strong adversary is approximately 0.297 with a 95% confidence interval between 0.277 and 0.316. Therefore, using the adversary to the full prior knowledge reduces the location privacy by approximately 37%.

For obfuscation level 2, the average location privacy achieved against the weak adversary is approximately 0.573 with a 95% confidence interval between 0.553 and 0.594, while the average location privacy achieved against the strong adversary is approximately 0.43 with a 95% confidence interval between 0.401 and 0.459. It is worth noting that the location privacy level is increased as a result of increasing the obfuscation level, which leads to generate more candidates. Using the adversary to the full prior knowledge reduces the location privacy by approximately 25%.

For obfuscation levels 3, 4, and 5, the average location privacy achieved against the weak adversary are approximately 0.618 with a 95% confidence interval between 0.598 and 0.639, 0.645 with a 95% confidence interval between 0.625 and 0.665, and 0.663 with a 95% confidence interval between 0.642 and 0.683 respectively. The average location privacy achieved against the strong adversary for obfuscation levels 3, 4 and 5 are approximately 0.51 with a 95% confidence interval between 0.483 and 0.538, 0.571 with a 95% confidence interval between 0.545 and 0.598, and 0.627 with a 95% confidence interval between 0.605 and 0.649. It is worth noting that the location privacy level is increased as a result of increasing the obfuscation level, which, again, leads to the generation of more candidates. Using the adversary to the full prior knowledge leads reduces the location privacy by approximately 17%, 11%, and 5% for obfuscation levels 3, 4, and 5, respectively.

Scenario 2.

In scenario 2, the strong adversary attack was used to run the inference attack against the geo-indistinguishability and MOPROPLS frameworks. The strong adversary attack uses the full prior knowledge available, which is the user’s existence probability and the user’s movement probability. The objective of this experiment is to evaluate the performance of our proposed framework.

For obfuscation level 1, the experiment shows that MOPROPLS outperforms geo-indistinguishability significantly as a result to modeling and proacting the privacy leakage sources. The average location privacy achieved by the geo-indistinguishability is approximately 0.297 with a 95% confidence interval between 0.277 and 0.316, while the average location privacy achieved by MOPROPLS is approximately 0.577 with a 95% confidence interval between 0.56 and 0.593. Therefore, MOPROPLS increases the location privacy by approximately 94%. For obfuscation level 2, the experiment shows that MOPROPLS provides a better level of location privacy than geo-indistinguishability. The average location privacy achieved by geo-indistinguishability is approximately 0.43 with a 95% confidence interval between 0.401 and 0.459, while the average location privacy achieved by MOPROPLS is approximately 0.615 with a 95% confidence interval between 0.595 and 0.635. It is worth noting that the location privacy level is increased for both as a result of increasing the obfuscation level, which leads to generate more candidates. Therefore, MOPROPLS increases the location privacy level by approximately 43%. For obfuscation levels 3, 4, and 5, the experiment shows that MOPROPLS provides a better level of location privacy than geo-indistinguishability. Increasing the obfuscation level leads to improve location privacy for both; at the same time, the effect of exploiting the measured privacy leakage to improve the location privacy becomes less as a result of increasing the obfuscation level. The average location privacy achieved by geo-indistinguishability for Obfuscation Levels 3, 4 and 5 are approximately 0.51 with a 95% confidence interval between 0.483 and 0.538, 0.571 with a 95% confidence interval between 0.545 and 0.598, and 0.627 with a 95% confidence interval between 0.605 and 0.649, respectively. The average location privacy achieved by MOPROPLS for Obfuscation Levels 3, 4 and 5 are approximately 0.623 with a 95% confidence interval between 0.605 and 0.642, 0.629 with a 95% confidence interval between 0.61 and 0.648, and 0.643 with a 95% confidence interval between 0.624 and 0.662, respectively. It is worth noting that the location privacy level is increased as a result of increasing the obfuscation level, which leads to generate more candidates. Using MOPROPLS leads to increase the location privacy by approximately 22%, 10%, and 2% for Obfuscation Levels 3, 4, and 5, respectively. The average privacy improvement of MOPROPLS for all obfuscation levels is approximately 34% (Figs. 2 and 3).

Fig. 2.
figure 2

Comparing the location privacy average using WeakAdv and StrongAdv (all obfuscation levels).

Fig. 3.
figure 3

Location privacy average comparison.

6 Conclusion

In this paper, we proposed the MOPROPLS framework, with the aim to preserve location privacy in the case of continuous queries. The performance for the framework was compared with geo-indistinguishability LPPM in terms of privacy (adversary estimation error), and the reported improvements average was 34%. For future work, we plan to extend our proposed framework to fit the requirements of preserving location privacy of smart city applications.