Privacy Considerations in Participatory Data Collection via Spatial Stackelberg Incentive Mechanisms

Mobile crowd sensing is a widely used sensing paradigm allowing applications on mobile smart devices to routinely obtain spatially distributed data on a range of user attributes: location, temperature, video and audio. Such data then typically forms the input to application specific machine learning tasks to achieve objectives such as improving user experience, targeting geo-localised query based searches to user interests and commercial aspects of targeted geo-localised advertising. We consider a scenario in which the sensing application purchases data from spatially distributed smartphone users. In many spatial monitoring applications, the crowdsourcer needs to incentivize users to contribute sensing data. This may help ensure collected data has good spatial coverage, which will enhance quality of service provided to the application user when used in machine learning tasks such as spatial regression. Privacy considerations should be addressed in such crowd sensing applications, and an incentive offered to “privacy-concerned” users to contribute data. A novel Stackelberg incentive mechanism is developed that allows workers to specify their location whilst satisfying their location privacy requirements. The Stackelberg and Nash equilibria are explored and an algorithm to demonstrate the approach is developed for a real data application.


Introduction
Mobile crowd sensing is becoming a widely used sensing paradigm as it allows applications on smart devices to routinely and efficiently obtain spatially distributed data on a diverse range of user attributes.Such data then typically forms the basis of input data in a variety of machine learning tasks that could have objectives such as improving user experience of applications they run on their smart phone or mobile device, targeting query based searches to their particular interests and geographic location as well as the commercial aspect of targeted advertising.
Previous studies on crowd sourcing data in machine learning contexts, such as Shah et al. (2015) and Shah and Zhou (2016), focused on aspects of provision of labeled training data that is facilitated by crowdsourcing frameworks.In such works, the authors focused on interesting questions pertaining to incentive mechanisms and developed voting schemes to improve aspects of the data collection in such frameworks to address practical concerns such the quality of the crowd sourced labels.They argued that such settings were of relevance in two contexts.Settings in which the application users (typically termed workers) were performing crowdsourced data labeling when in fact they were not experts on the labeling task or alternatively when the application interface is not adequately designed to ensure that workers were able to convey their knowledge accurately on the labeling task.
In this work we consider different aspects of the crowdsourcing data provision framework that were not previously studied.However, we believe they are of direct practical relevance to numerous applications in this emerging area of crowdsourced data provision for machine learning algorithms.We consider the situation in which the sensing application may incentivise or "buy" sensor data from spatially distributed mobile smartphone users (workers) instead of deploying their own sensor networks.In such contexts, the geo-location of the sourced data is directly relevant to the machine learning task that the application will consider when providing the content to the users of the application.Furthermore, we consider an important aspect not previously addressed related to the ability to incentivise crowd sourced data in the presence of workers who are privacy aware and therefore require greater monetary incentives to forgo or relax their privacy concerns when supplying data to the crowd sourcing application.
The spatial aspect of such problems, where the geo-location of sourced data is of direct relevance, is previously under explored but is relevant to many unsupervised or semisupervised machine learning application contexts.For instance, we explore a setting in which one may wish to perform a statistical task of say Gaussian process spatial field regression or estimation using crowdsourced data.Such tasks can then provide estimates of application specific information to a user at the location of the smartphone, based on a statistical models calibrated to previously sourced spatially distributed data of relevance.Numerous examples come to mind for such applications including localized climate information, traffic density information, local pollution levels etc.
Such mobile crowd sensing platforms Guo et al. (2015) have also become an emerging sensing paradigm in the age of the Internet of Things (IoT) that replaces fixed sensing infrastructure and removes its deployment and maintenance costs.The interface of such IoT architectures and statistical machine learning tasks is an exciting new area beginning to emerge, which will require ideas such as those addressed in this manuscript to facilitate data sourcing.
As noticed in Murakami et al. (2016), such sensing platforms can also be practically useful to exploit the growth of mobile smartphone ownership and tap into smartphone users to contribute sensing data that are easily obtained via the available sensing capabilities of their smartphones.This allows the sensing platform to then utilise this spatially distributed data to perform important practical tasks such as to estimate some statistics of a spatial event or to conduct spatial regression.Other examples of mobile crowd sensing applications for spatial monitoring include environmental temperature monitoring Overeemn et al. (2013), traffic monitoring Thiagarajan et al. (2009) or earthquake detection in early warning systems Minson et al. (2015).All of these aforementioned application domains have benefited from the fact that mobile crowdsourced data has improved the spatial coverage of the collected dataset and therefore improved the resolution of spatial regression model estimations.
In our framework, we consider the aspect of spatial coverage as one of the main objectives of an incentive mechanism used by spatial monitoring applications.
Additionally, current privacy-preserving works such as Nissim et al. (2012), Yang et al. (2013), and Singla and Krause (2013) have attempted to address the user location privacy problem in the crowd sensing domain.This is because the privacy issues can easily deter potential users from participating, which in turn reduces both the amount of available data as well as spatial coverage of the data used by the crowdsourcer.

Location-Privacy Preservation and Incentivised Crowdsourced Data
Existing privacy-preserving works that offer location privacy via location or data perturbation are not directly applicable to crowd sensing applications that require specific and true locations.For example, it would be unacceptable for a traffic monitoring application if there was a traffic congestion in road X, but due to location or data perturbation, another road Y or a non-congested status was reported respectively.Thus, it is vital for incentive models to address the spatial coverage and location privacy issues concurrently.
The most common incentive mechanism is the auction game, see examples in Yang et al. (2012), Nissim et al. (2012), and Singla and Krause (2013) and (Restuccia et al. 2016).In such settings the smartphone workers submit bids (e.g., prices or efforts in Luo et al. (2016b)) for their sensing data while the crowdsourcer selects the set of workers with the lowest bids and pays them accordingly.
An alternative and fundamentally different game-theoretic approach to model the incentive problem is the Stackelberg leader-follower game where the crowdsourcer (leader) first decides on the total reward to pay workers (i.e., the leader's budget constraint) while the workers (followers) then individually decide on the amount of data to contribute, see Yang et al. (2012) and Luo et al. (2016a).
However, different from the system models used in Yang et al. (2012) and Luo et al. (2016a) where the crowdsourcer first informs the workers of the total reward offered ( Fig. 1).Then subsequently, the workers compute the optimal data to contribute (which may not be intuitive or practically feasible for the smartphone users in practice), we allow the Fig. 1 Privacy model of users: privacy-concerned users can declare a cloaking region that contains their true location instead of providing fine-grained location information to the crowdsourcer crowdsourcer to specify the exact reward offered to each individual worker (see stage 2 in Fig. 2).In this way, the workers can be assured that the offered rewards are always their optimal (Nash equilibrium) solution and there is no need to share all the information on the workers' costs and privacy preferences between other competing workers.
Privacy-aware incentive mechanisms were considered in Nissim et al. (2012), Yang et al. (2013), andSingla andKrause (2013) which used data perturbation or dummy locations Kido et al. (2005) to protect location privacy.However, the methods may not be applicable to many spatial monitoring applications where a dataset with perturbed data or dummy location may trigger a false alarm and make the application unreliable.In this paper, we allow workers to obfuscate their precise location information by declaring a coarse-grained region (see Fig. 1) that encompasses their true location (similar to the location cloaking principle in Cheng et al. (2006) and Andrés et al. (2013)).This step is important as privacy concerns can deter potential workers from participating in the crowd sensing activity.In addition, the work in Lin et al. (2014) found that smartphone users were more willing to provide coarse-grained location information than fine-grained information.Hence, our Stackelberg model may still buy data (albeit with lower quality location information) from privacy-sensitive workers to improve the spatial coverage of the collected dataset.Furthermore, the cloaking region technique is more practical for real-world applications that require reliable information.Although the location information of the workers may be imprecise due to the location cloaking, but it is still accurate as there is no data perturbation or dummy locations involved.
Additionally, existing Stackelberg models assume that workers sell undifferentiated goods (data) and do not consider the quality (e.g., granularity) of each worker's data and the workers' contribution to the spatial coverage of the collected dataset.However, adding location information increases the computational complexity of the problem and thus, Feng et al. ( 2014) proposed an auction-based approximation algorithm to assign the workers' sensing tasks.The authors assumed that the sensing platform periodically publishes sensing tasks for specific locations of interest and did not explicitly address the issue of improving the spatial coverage of the collected dataset in general.

Our Contributions
We demonstrate that an appropriate incentive mechanism to model the hierarchical relationship between the crowdsourcer and the smartphone users is the Stackelberg (leader-Fig.2 Interaction model between the crowdsourcer and smartphone users (workers).The user profile consists of the user's sensing cost incurred per unit of data c i , location granularity ρ i , and partitioned region l i follower) game model used in Yang et al. (2012) and Luo et al. (2016a).In the Stackelberg model, the crowdsourcer (leader) commits a reward strategy that is observed by the smartphone users (followers) who then strategize the amount of data to sell.However, existing Stackelberg incentive models simply select user data independently of their physical location Feng et al. (2014) and do not attempt to improve the spatial coverage of the dataset.
Therefore, we propose extending the existing Stackelberg incentive models to include the privacy-awareness property as well as to improve the spatial coverage of the collected dataset.Our model allows privacy-sensitive users to submit coarse-grained (or quantized) location information which could still be useful to the crowdsourcer.We then study the properties of the proposed Stackelberg incentive model analytically and present efficient algorithmic solutions.Our proposed Stackelberg incentive model does not require a trusted third party for privacy and can protect users against a crowdsourcer who cannot be trusted to anonymize the smartphone users' location information.The current work extends our preliminary work in Koh et al. (2017), by accommodating bounds in the workers' sensing data, analyzing the existence, uniqueness and Pareto efficiency of the Stackelberg equilibrium, and conducting a real-world sensing case study to demonstrate the practicality of the proposed solution.
The main contributions of the paper are as follows: • We propose a novel spatial privacy-aware Stackelberg incentive scheme that allows privacy-sensitive mobile smartphone users to quantize their location information using cloaking regions.Our proposed solution also seeks to improve the spatial coverage of the collected dataset.• We prove that under our Stackelberg incentive mechanism, we are able to develop a unique Nash equilibrium for the Followers game, and obtain a closed-form expression for each mobile smartphone user's optimal data contribution.• We also prove the existence of a Stackelberg equilibrium when the crowdsourcer (leader) imposes constraints on the minimum and maximum amount of data contribution from each user and we further derive sufficient conditions for the Stackelberg equilibrium to be Pareto efficient.• We demonstrate via simulations using a real-world sensing dataset that the proposed spatial privacy-preserving Stackelberg game produces a mechanism that for a common cost, will produce greater spatial diversity in crowdsourced data, leading to better model predictive performance in applications such as spatial field reconstruction compared to two other coverage-maximizing schemes.

Problem Formulation and Analysis
In this paper, we address the following problem statement: suppose there is a crowdsourcer (who is the data buyer and leader on the demand side) who aims to buy sensing data from mobile smartphone users (who are the workers and followers on the supply side) for applications such as spatial monitoring; we aim to design an incentive mechanism such that the collected dataset: (i) has good spatial coverage, and (ii) is location privacy-preserving for the workers.We first present our system model and the proposed Stackelberg incentive model incentive model before studying the proposed game analytically.We extend the results in (Yang et al. 2012, Section 3) to take into account the workers' location granularities and regions.The table of notation is given in Table 1.Set of participating workers i in region l with t i < t.

Model and Assumptions
We consider a crowd sensing system that consists of a set of I = {1, . . ., N} workers and a single crowdsourcer who partitions the entire spatial area of interest into a set of L regions denoted by L. We assume that the workers are rational and non-cooperative, i.e., each worker maximizes its own utility.Each worker i ∈ I has its own sensing cost per unit of data c i ∈ (0, c], location granularity ρ i ∈ [ρ, ρ], cloaking region l i (which may be of different granularity for each worker) and its corresponding region l i (defined by the crowdsourcer) where l i ⊆ l i ⊆ L. The interaction model between the crowdsourcer and the mobile smartphone users is illustrated in Fig. 2. Essentially, the crowdsourcer collects the worker profiles (c i , ρ i , l i ) and selects the optimal set of workers that maximizes its utility.
The crowdsourcer then offers the selected workers a reward in exchange for some specified amount of sensing data.Next, the selected workers (which we refer to as participating workers) will proceed to collect their sensing data and transmit it along with their cloaking region l i to the crowdsourcer and receive their rewards in return.Note that there are two types of regions l i and l i in our model: l i is the initial coarse-grained partitioned region defined by the crowdsourcer and l i is worker i's cloaking region, which is submitted only when the worker is selected.We assume that the granularity of the worker i's submitted cloaking region l i is proportional to its location granularity parameter ρ i , since a privacy-sensitive worker (with low ρ i ) is likely to provide only coarse-grained location information, while a privacy-insensitive worker (with high ρ i ) is more likely to provide finer-grained location information.Hence, the parameter ρ i allows the crowdsourcer to differentiate between workers with different levels of sensitivity to privacy and preserve the privacy of unselected workers since they only reveal more information on their locations when they are selected.Note that the workers can anonymize their precise location information via cloaking regions with area inversely proportional to ρ i and do not rely on the crowdsourcer for anoymization.In practice, this can be implemented in the crowdsourcing software on the workers' smartphone devices to allow each worker to select cloaking regions with different (possibly discrete) location granularities.
We model the incentive mechanism as a Stackelberg game, which consists of the crowdsourcer (data buyer) as the leader and the N smartphone users (workers and data contributors) as the followers.The crowdsourcer acts first and commits a reward strategy while the workers subsequently choose their best responses after observing the crowdsourcer's strategy.The strategy of the crowdsourcer is the reward for each partitioned region R = (R 1 , . . ., R L ) and the strategy of worker i is the amount of data contributed t i ≥ 0. The crowdsourcer only optimizes the reward R l allocated to each partitioned region l ∈ L and subsequently offers each participating worker i in the partitioned region l a fraction of R l depending on the proportion of data contribution, weighted by the location granularity: where Q l i is the set of participating workers i in partitioned region l i ∈ L with t i > 0 and we assume |Q l i | > 1.A similar reward function has also been used in (Yang et al. 2012).

Utility Functions for Crowdsourcer and Workers
In this section we introduce the formulations for the utility functions for each party in the crowdsourcing data mechanism.

Crowdsourcer:
We define the utility function of the crowdsourcer to be U CS (R; t) = i∈I f d (t i , l i , ρ i ), where f d (t i , l i , ρ i ) is a function of worker i's amount of data contributed t i , partitioned region l i , and its location granularity ρ i .For simplicity, we let f d (t i , l i , ρ i ) = α i t i β where α i > 0 is a function of (l i , ρ i ), and the parameter 0 < β < 1 is used to control the rate of diminishing returns on each worker's data.The power function was also used in (Powell and Batt 2008) to model diminishing returns.Thus, the crowdsourcer's utility function is given by: To increase the coverage area of the collected dataset, the crowdsourcer can assign a higher α i value to workers located at less populated regions l i .In addition, a higher α i value can be assigned to workers that provide finer location granularity ρ i .The α i parameter allows the crowdsourcer to differentiate between the quality (e.g., spatial coverage area and the granularity of the location information) of each worker's data.

Workers:
We define the utility function of worker i to be the amount of reward received from the crowdsourcer as defined in Eq. 1 minus the cost incurred for obtaining the data: where t −i is a vector of the amount of data contributed by all workers except worker i.

Stackelberg Game Formulation
Given that the crowdsourcer wants to achieve a large coverage area for its dataset while satisfying a budget constraint R budget , a minimum amount of reward R min l > 0 for each region l (this allows the crowdsourcer to specify more important regions), and a maximum amount of reward allocation R max l > 0 for each region l (which may be set to ∞), it solves the following optimization problem: where t i is the optimal solution to Problem 2. Each worker i solves the following optimization problem: Problems 1 and 2 form the Stackelberg game and our goal is to find the Stackelberg equilibrium point(s) where neither the crowdsourcer nor the workers have incentive to deviate.A Stackelberg equilibrium (see Definition 1) is a subgame-perfect Nash equilibrium such that no player can improve its utility by unilaterally deviating from its strategy.
Definition Stackelberg Equilibrium Let R * be the optimal solution for the crowdsourcer, obtained by solving Problem 1, and t * be the optimal solution for the workers, obtained by solving Problem 2. The strategy profile (R * , t * ) is a Stackelberg equilibrium for the proposed Stackelberg incentive model if the following conditions are satisfied for any (R, t) where: We apply the backward induction method to analyze the proposed Stackelberg incentive model.First, we start with the Followers game (a non-cooperative game played by all the workers) and study the predicted best response t * i (solution of Problem 2) for each worker i as a function of the reward R l i offered by the crowdsourcer and the strategies of the other workers t −i .Subsequently, we analyze the best response of the crowdsourcer in Problem 1.

Nash Equilibrium of Followers game
We first consider the Followers game given by the triplet (I, {t i } i∈I , {u i } i∈I ) where I is the player set of N workers and u i is the utility function of worker i.By Lemma 2, there exists a unique Nash equilibrium point in the Followers game.Next, we derive the unique Nash equilibrium of the Followers game in Theorem 3.
Lemma A unique Nash equilibrium exists in the Followers game (I, {t i } i∈I , {u i } i∈I ).
Theorem The Followers game (I, {t i } i∈I , {u i } i∈I ) has a unique Nash equilibrium given by the following closed-form expression: where Q l i is the set of participating workers in region l i .Proof: See Appendix 2.
We say that worker i is a participating worker if t * i > 0. Hence, from Eq. 6, all participating workers i ∈ Q l i should satisfy the constraint: Subsequently, we will make use of this constraint in Algorithm 1 (modified from (Yang et al. 2012, Algorithm 1)), which computes the Nash equilibrium solution for the Followers game.
The optimal t * i for the workers is given by the Nash equilibrium solution Eq. 6 of the Followers game.However, the expression in Eq. 6 requires knowledge of the set of participating workers Q l i , the sensing costs c i , and the location granularity ρ i of all workers i ∈ I. Hence, we propose Algorithm 1 which makes use of Eq. 7 to greedily compute Q l i and solve for the optimal t * i of the workers (in terms of R l i ).The values of t * i can then be substituted into Problem 1 to solve for R * .
Lemma Assume that there are at least two workers in each region l.Algorithm 1 selects the set of participating workers Q l that achieves the unique Nash equilibrium solution of the Followers game.
Theorem 5 proves the correctness of Algorithm 1 while Lemma 4 simply states that the Q l set computed in Algorithm 1 is correct.
Theorem Assuming that there are at least two workers in each region l, and the set of Q l from Lemma 4, then Algorithm 1 outputs the unique Nash equilibrium solution of the Followers game.

Stackelberg Equilibrium
Using the analytical result Eq. 6 for the Followers game, the crowdsourcer can optimize its reward strategy R efficiently by substituting the analytical result into its utility function in Eq. 2 to obtain: where Although it is not trivial to obtain a closed-form expression for R that maximizes Eq. 8 while satisfying the constraints in Eq. 4, we show in Theorem 6 that there exists a unique Stackelberg equilibrium which results in a stable equilibrium strategy profile.This allows the crowdsourcer to uniquely predict the behaviors of the workers and efficiently compute its optimal R * .Both Theorems 3 and 6 extend (Yang et al. 2012, Theorem 2) to the case where the workers' location granularities ρ and regions l are also considered.

Theorem
The proposed Stackelberg incentive model has a unique Stackelberg equilibrium.
Proof Recall from Theorem 3 that the Followers game has a unique Nash equilibrium point.It can be easily shown that the best strategy set of the crowdsourcer is convex and compact since the domain of R is a Cartesian product of closed intervals, and U CS is continuous in R. Hence, we need to show the strict concavity of U CS to conclude that there exists a unique maximum point.The partial derivatives of U CS with respect to R l are as follows: where τ i is given in Eq. 8.Note that ∂ 2 U CS ∂R 2 l < 0 as we assume α i > 0 for all i ∈ I and 0 < β < 1.Since the Hessian matrix of U CS is a diagonal matrix, its eigenvalues are given by ∂ 2 U CS ∂R 2 l , which are all strictly negative.This implies that the Hessian matrix is negative definite and thus, U CS is strictly concave in R.
Using the results from Theorem 5 (which gives the closed-form solution to Problem 2), the Stackelberg equilibrium solution can be computed by solving Problem 1, which is a convex optimization problem.By Theorem 6, there exists an unique reward strategy R * that maximizes the crowdsourcer's utility in Problem 1.This means that R * can be efficiently computed using well-known interior point methods.

Analysis of Optimal Declared Sensing Costs
In this section, we study the strategies of the workers in selecting their optimal sensing costs to declare to the crowdsourcer.This is essential as each worker may declare a false sensing cost to maximize its utility.For this section, we let the declared sensing cost for each worker i be denoted by c i and its true sensing cost be denoted by c i .Note that the true sensing cost c i is unknown to the crowdsourcer and other workers j ∈ Q l i where j = i.
Suppose that each worker i may lie about its sensing cost to maximize its utility u i , in such a dishonest reporting scenario, the optimization problem previously specified in Problem 2 is replaced with that considered in Problem 3 below.The interaction model previously shown in Fig. 2 needs to be updated as each worker i requires additional information on Q l i and j ∈Q l i c j ρ j to solve Problem 3. One way for the workers to obtain the required information would be for the crowdsourcer to provide a platform for the workers to view the required information for a fixed time period.If the workers can declare or modify their c i values during the fixed time period, then they can achieve the Nash equilibrium solution given in Theorem 9. where and 0 otherwise, according to Eq. 6.
We first derive the expression for the worker i's utility in terms of its declared sensing cost and the declared sensing costs of the other participating workers j in its region l i .Using Algorithm 1, the crowdsourcer offers to buy amount of data from each worker i according to Eq. 6.From Eq. 26, we have the expression: R l i .We substitute the expressions for t i and j ∈Q l i t j ρ j into the utility of worker i ∈ Q l in Eq. 3 to obtain: Next, we have the following Theorem 9 on the Nash equilibrium solution of the Followers game when the workers can optimize their declared sensing costs.
Lemma Suppose that worker i ∈ Q l declares a sensing cost c i < c i , then it always incurs a negative utility Proof Suppose worker i is not in the set of participating workers Q l if it does not lie about its sensing cost, i.e., c i ρ i ≥ j ∈Q l c j ρ j |Q l |−1 .To be in the set of participating workers Q l , the worker i can declare a sensing cost c i < c i such that In order for its utility to be negative, i.e., u i < 0, we have the following necessary condition from Eq. 11: Similarly, for u i > 0, we have the following necessary condition from Eq. 11: In order for its utility to be negative, i.e., u i < 0, we have the following necessary condition from Eq. 11: Similarly, for u i > 0, we have the following necessary condition from Eq. 11: Hence, the proof is complete.
Lemma Given the set of participating workers Q l , the worker i ∈ Q l can maximize its Proof: See Appendix 5.
Theorem Assuming that each worker i ∈ Q l optimizes its sensing cost c i using Lemma 8, then the Nash equilibrium solution of the Followers game occurs when Proof By Lemma 7, we have the constraint |Q l |−1 for all workers i ∈ Q l to ensure that all participating workers i ∈ Q l have a positive utility, i.e., u i > 0. By Lemma 8, the worker i ∈ Q l can maximize its utility u i if it declares a sensing cost of Remarks: If we assume that the workers i ∈ Q l have a fixed time period to declare or modify their c i values and that the crowdsourcer provides a platform for each individual worker to observe the total declared sensing cost c j and location granularities ρ j of all other workers j ∈ Q l in the same region, then the equilibrium point in Eq. 12 can be achieved.

Adding Bounds on the Amount of Workers' Data and Achieving Pareto Efficiency
In this section, we study how the crowdsourcer is able to introduce bounds on the workers' data contributions and how it can achieve Pareto efficiency for the Stackelberg equilibrium.

Bounds on the Amount of Contributed Data t i
In a practical real-world crowdsourcing application, the crowdsourcer may impose (lower and upper) bounds on the amount of data contribution from each worker i due to various reasons.For example, it may not be useful to the crowdsourcer if each worker only contributes a small amount of data, hence we consider the imposition of a lower bound tρ on the workers' data contribution.In the other extreme, to maximize the spatial coverage of the obtained data, it may not be useful to the crowdsourcer if there are a small number of workers who are monopolies.Hence, we also consider the imposition of an upper bound tp on the workers' data contribution.For simplicity, we assume that the same bounds tρ and tp apply to all participating workers.
Next, we have the following Theorem 10 to introduce the lower and upper bounds on the worker i's t i ρ i values and Theorem 14, which states the sufficient condition for the bounds.Finally, we present Algorithm 2, which computes the bounded Nash equilibrium solution for the Followers game.
Theorem Assume that c i ρ i ≤ 1 for all i ∈ I.The crowdsourcer is able to introduce both the lower bound tρ and upper bound tp in the Stackelberg equilibrium of all participating workers i in region l if the following constraints are satisfied: where we let the sensing cost and location granularity of the worker i with the least c i ρ i value in region l be denoted by c M l and ρ M l respectively.
Proof We first show the following two Lemmas, which are used to prove the theorem: Lemma Suppose that c i ρ i ≤ 1 for all i ∈ I and Eq. 14 is true.Then the crowdsourcer is able to introduce a lower bound tρ in the Stackelberg equilibrium of all participating workers i in region l.
We first prove Lemma 11.To show that the Stackleberg solution exists in the constrained region, one must show that t * i ρ i ≥ tρ for all i ∈ Q l .To prove that t where δ i ≥ 0. We take the solution in Eq. 6 and substitute the c i ρ i expression from above and multiply both sides by ρ i to obtain: Since δ i ≥ 0, we conclude from Eq. 15 that t * i ρ i ≥ tρ for all i ∈ Q l .To verify that the introduction of the lower bound preserves the concavity of the crowdsourcer's Problem 1, we analyze the Hessian matrix of U CS with respect to R l .
We first apply the chain rule: We now attempt to derive l .From Eq. 15, we have is inversely proportional to R l i .This is because |Q l | ∝ R l i according to Eq. 14 and the term at a rate slower than |Q l | when c i ρ i ≤ 1.Therefore, we conclude from Eq. 15, that t i is a monotonically non-decreasing function of R l .This implies convexity, i.e., l ≤ 0, the Hessian matrix of U CS is negative semidefinite for all R l ∈ R. Thus, we conclude that U CS is concave in R.
To introduce the lower bound tρ constraint in the proposed Stackelberg incentive model, the constraint Eq. 14 should be used instead of constraint Eq. 7 in line 6 of Algorithm 1.
Next, we derive the sufficient conditions for the upper bound tp on the workers' data contributions.For each region l, we let the sensing cost and location granularity of the "cheapest" worker i with the least c i ρ i value be denoted by c M l and ρ M l respectively.Using the t * i expression in Eq. 6, we proceed to derive the maximum t i ρ i contributed by the "cheapest" worker in each region l: This leads to the following Lemma 12.
Lemma The crowdsourcer is able to introduce an upper bound tp in the Stackelberg equilibrium of all participating workers i in region l if: We now prove Lemma 12.According to Eq. 6, the worker i ∈ Q l with the least c i ρ i value will contribute the most amount of t i ρ i in its region l.Thus, to prove that the constraint in Eq. 17 leads to t * i ρ i ≤ tp for all i ∈ Q l , it is sufficient to show that the data contribution t M l ρ M l from the participating worker i with the least c i ρ i value is less than or equal to the upper bound tp, i.e., t M l ρ M l ≤ tp.Indeed, we substitute the R max l term from Eq. 17 into Eq.16 to obtain t M l ρ M l = tp.This implies that t * i ρ i ≤ tp for all i ∈ Q l .Hence, the proof is complete.Note that the constraint on R l simply constraints the feasible region and does not affect the concavity of crowdsourcer's Problem 1.
Corollary The Stackelberg equilibrium can be shown to exist under the constraints in Theorem 10 but the uniqueness property of the unconstrained solution is lost.Furthermore, the solution can be obtained via the same algorithmic solution as utilized in the original unconstrained case, with a minor modification to Algorithm 1 as shown in Algorithm 2.
Finally, we have the following Theorem 14, which states the sufficient conditions where each worker's optimal data contribution satisfies the crowdsourcer's tρ and tp bounds.This allows the crowdsourcer to estimate the number of participating workers in a region l given its chosen bounds tρ and tp, and reward allocation R l .
Theorem Suppose Eq. 18 is true for all i ∈ Q l .Then the worker i's optimal data contribution in the Stackelberg equilibrium will satisfy the crowdsourcer's tρ and tp bounds, i.e., tρ ≤ t * i ρ i ≤ tp: where Proof We first show that Eq. 18 is a sufficient condition for t * i ρ i ≤ tp.From Eq. 18, we obtain: where (i) we substitute R l i from Eq. 26, and (iii) we substitute t * i ρ i = j ∈Q l i t j ρ j − c i ρ i R l i j ∈Q l i t j ρ j 2 from Eq. 25.
After proving that Eq. 18 is a sufficient condition for t * i ρ i ≤ tp, we show that Eq. 18 also leads to t * i ρ i ≥ tρ.From Eq. 18, we have , which by Lemma 11 leads to t * i ρ i ≥ tρ.Hence, the proof is complete.
Remarks: The feasible tρ and tp for each worker i vary according to the its sensing cost c i and location granularity ρ i , which affect δ i (the difference between the maximum allowable c i ρ i value in the set of participating workers and the worker i's c i ρ i value).When δ i = 0, there is strict equality in Eq. 18 and tρ equals tp.However, as δ i increases, we have tρ tp.For a fixed reward R l , the feasible range between tρ and tp increases as the number of participating workers Q l i increases.

Achieving Pareto Efficiency
We examine the Pareto efficiency (see Definition 15) of the Stackelberg equilibrium point in our proposed Stackelberg incentive model and study how to achieve efficiency.At the Pareto efficient equilibrium point, the crowdsourcer's rewards are allocated in such a way that it is not possible to reallocate them to increase the utility of any individual (including the crowdsourcer itself) without making at least one individual's utility decrease.In other words, it implies an efficient allocation of resources.
Definition Pareto Efficiency A strategy profile (R P , t P ) is Pareto efficient if there exists no other strategy (R, t) where R 0, t 0 such that: with at least one strict inequality.
Theorem The proposed (unbounded) Stackelberg game has a unique Stackelberg equilibrium (R SE , t SE ) that may not be Pareto efficient.Proof: See Appendix 6.
By Theorem 16, the proposed (unbounded) Stackelberg game has a unique Stackelberg equilibrium (R SE , t SE ) that may not be Pareto efficient.To achieve efficiency of the Stackelberg equilibrium, we first define a social welfare function w(R, t) to be the weighted sum of the crowdsourcer and the workers' utilities: for some weights γ CS , γ 1 , . . .> 0.
It is well-known that any allocation which maximizes a social welfare function is also Pareto efficient.Thus, to achieve efficiency of the Stackelberg equilibrium, we can introduce the penalty function: One way to encourage the crowdsourcer to maximize w(R, t) would be to introduce a third party regulator that can offer tax rebates proportional to the weighted sum of the worker's utilities (t), thus contributing to the crowdsourcer's utility.Note that the penalty function (t) is a convex function of R and hence does not affect the convexity of the crowdsourcer's Problem 1.Therefore, the unique Stackelberg equilibrium solution still exists.

Simulation Study
To evaluate the performance of the proposed Stackelberg incentive model in a real-world spatial data sensing application, we design a spatial data sensing case study application where we assume that the crowdsourcer wishes to perform a spatial estimation task.
To achieve this, the crowdsourcer requires data from mobile smartphone users (workers) located in a range of different locations.The greater the spatial diversity obtained in the data, in general the easier it will be for the crowdsourcer to perform the spatial estimation task.In particular, we will design the case study to undertake a challenge of spatial prediction.The dataset collected by the crowdsourcer will be obtained by application of our spatial privacy-preserving data sharing mechanism based on our proposed Stackelberg incentive model (Algorithm 2), which attempts to maximize spatial coverage.
We consider a real-world mobile crowd sensing problem such as the spatial monitoring and prediction of environmental temperature Chen et al. (2015) and Mun et al. (2009).This involves the crowdsourcer incentivizing and paying the workers for the data they collected from their spatially placed sensors.Using the collected dataset, the crowdsourcer wishes to make a spatial estimation, achieved by conducting a spatial regression based on a Gaussian process Rasmussen and Williams (2005), estimated from the collected data.We then evaluate the spatial regression performance of the proposed Stackelberg incentive model against two baseline non-game-theoretic incentive schemes that seek to maximize the spatial coverage of the collected dataset.

Baseline Coverage Metrics
We consider the two baseline coverage maximizing schemes: (i) the location-based incentive mechanism proposed in Jaimes et al. (2012), which maximizes a geometric disk coverage model, and (ii) the work in Xiong et al. (2016), which maximize a k-depth coverage model.The two coverage models (see Fig. 3) are detailed as follows.

Geometric Disk Coverage Metric
The geometric disk coverage scheme was proposed in Jaimes et al. (2012) to measure the coverage c(x i ) of a sensor data from a precise (uncloaked) location x i : where || • || 2 denotes the Euclidean distance and r is the sensed radius of the sensor data.
To optimize the disk coverage metric, the crowdsourcer greedily buys the minimum amount of data t from each worker in regions where c(x i ) = 0, starting with the cheapest worker first.For a fair comparison with our proposed Stackelberg incentive model, we let c(x i ) = 1 when x i ∈ l where l is a partitioned region.

k -depth Coverage Metric
The following k-depth coverage model (and its variants) was proposed in Xiong et al. (2016) to measure coverage of a set of N sensor data t 1 , . . ., t N from a region l where the coverage c(t 1 , . . ., t N ) = min(N, k) or equivalently: where k is the depth parameter.
To optimize the k-depth coverage metric, the crowdsourcer greedily buys the minimum amount of data t from k workers in each region where c(t i ) = 0, starting with the cheapest worker first.

Simulation Setup
We use the temperature measurements from the Intel lab dataset Madden (2004), which contains the temperature, humidity, light, voltage, connectivity, and location information collected from 54 sensor nodes deployed in the Intel Berkeley Research lab between February 28th and April 5th, 2004.We partitioned the lab's spatial area into the eight regions l as shown in Fig. 4. The proposed Stackelberg incentive model and the baseline coverage schemes are then used to purchase a subset of the available temperature data.We took a one-hour interval (from 01:00-02:00, 28/2/2004) from the dataset and apply the Gaussian process regression technique Rasmussen and Williams (2005) (a supervised learning technique for regression) to evaluate how the two coverage metrics correspond to the actual amount of predictive uncertainty.The (Gaussian) radial basis function (RBF) kernel Rasmussen and Williams ( 2005) is used for our Gaussian process regression.
We briefly introduce the main idea behind the Gaussian process regression.Given a set of n input training location vector x and observations y, we assume that the observed y are Fig. 4 Partitioned regions of the Intel lab, which contains 54 sensors nodes generated by some latent function f plus an independent and identically distributed Gaussian noise with zero mean and variance σ 2 y .Suppose there are n * test points where we are interested in obtaining the predicted observation values.Let k(•, •) be a covariance function and let K(X, X * ) denote the n × n * (kernel) matrix of the covariances evaluated at all pairs of training and test points, and similarly for K(X, X), K(X * , X * ) and K(X * , X).The predictive (posterior) mean f * (x * ) and variance V (x * ) for the Gaussian process regression for a new input test vector x * are given by: There are a number of publicly available libraries that implement the Gaussian process regression and we chose to use Python's scikit-learn machine learning library Pedregosa et al. (2011) in our simulations.

Test Scenarios
The following two test scenarios were examined.
Scenario (I): Spatial regression of the two cross intersections of interest (second and fourth column from the left side of Fig. 4) where no sensor data is available.We assume that each worker i's sensing cost c i is inversely proportional to its distance from the nearest cross intersection of interest, i.e., workers have a higher sensing cost if they are located near the two cross intersections and their location granularities ρ i are inversely proportional to the number of workers in their respective regions, i.e., workers located in denser regions are less privacy-sensitive.We use the predictive variance V (x * ) of the spatial Gaussian process regression in the two cross intersections (points of interest) as the metric for comparison.Intuitively, a lower predictive variance implies better predictive performance.
Scenario (II): Spatial regression of the entire spatial area shown in Fig. 4. We assume that the workers' sensing costs c i are uniformly selected from [0.25, 0.5, 0.75, 1] and their location granularities ρ i are inversely proportional to the number of workers in their respective regions.We conducted spatial Gaussian process regression to obtain the predicted (mean) temperature f * (x * ) of all the sensor locations.We use the mean squared error (MSE) values (computed by taking the difference between the predicted temperature and the actual temperature measurement from the dataset) as the metric for comparison.Intuitively, a lower MSE value implies better predictive performance.

Simulation Parameters
We set the sensing cost c i ∈ (0, 1], location granularity = ∞ for all l ∈ L, minimum data t = 1, and maximum data t = 3.We chose a budget constraint R budget to limit the number of purchased data, i.e., the scheme cannot afford to purchase all the available temperature data.In addition, the amount of budget spent by all three schemes are capped at the same amount.In the two baseline coverage schemes, we offer each worker their sensing cost for each unit of data.We let the crowdsourcer's system parameter α i = ρ i and β = 0.1 in our proposed Stackelberg incentive model.We varied the k value of the baseline k-depth coverage scheme.

Simulation Results and Discussion
We now discuss the simulation results for the test scenarios (I) and (II).
Scenario (I): We list the predictive variances of the two baseline coverage schemes and the proposed Stackelberg incentive model at the two cross intersections of interest in Table 2, and the corresponding coverage scores in Table 3.
From the results, we observe that the proposed Stackelberg incentive model has better predictive variances in the two cross intersections of interest compared to the baseline disk coverage scheme.The predictive variances from the k-depth coverage scheme and the proposed Stackelberg incentive model are the same for all values of k > 1.This is because the sensing costs of the workers are the lowest when the workers are located far away from the two cross intersections of interest and under the limited budget constraint, the k-depth coverage scheme and the proposed Stackelberg incentive model will selected the same set of participating workers.Note that the used disk coverage scheme is equivalent to the k-depth coverage scheme when k = 1.This is due to the usage of the partitioned regions instead of   4, and the corresponding coverage scores in Table 5.From the results, we observe that the proposed Stackelberg incentive model has better predictive performance (lower MSE value) in the entire spatial area of interest compared to the two baseline coverage schemes.While the baseline coverage schemes simply prioritize data from cheaper workers, our Stackelberg incentive model is able to offset the higher sensing costs of the workers with the α i parameters.To visualize the location of the participating workers and the predicted mean temperature values, we plot the heat map of the predicted mean values for the baseline disc and k-depth schemes for k = 3 (where the MSE value is the lowest) and the proposed Stackelberg incentive model in Fig. 6.

Accommodating Location Uncertainty
The Gaussian process regression technique accommodates location uncertainty of the workers' sensing data due to the use of cloaking regions.Assuming that the input locations of the sensing data x are corrupted by i.i.d.Gaussian noise with the noise variance set to the square of the approximated radius of the cloaking regions, the Gaussian process regression model proposed in Mchutchon and Rasmussen (2011) can be applied to account for location uncertainty.Mainly, an additional corrective term proportional to the gradient of the posterior mean can be added into the noise term in Eq.21 to account for the location uncertainty of the training inputs.

Conclusion
We designed a privacy-aware Stackelberg incentive mechanism that improves the spatial coverage of the collected dataset.Our proposed incentive model is privacy-aware, in that it allows privacy-sensitive mobile smartphone users to submit coarse-grained (or quantized) location information to the crowdsourcer.We studied the properties of the proposed Stackelberg incentive model analytically and presented efficient algorithmic solutions.We also extended the basic model to accommodate bounds on the users' data contributions and studied how Pareto efficiency can be achieved.We showed via simulations using a realworld sensing dataset that our proposed incentive model produced greater spatial diversity in sourced data, leading to better model predictive performance compared to two other coverage-maximizing schemes that maximize a different coverage metric.For future work, it would be interesting to extend our (static game) Stackelberg model for dynamic games played over a period of time where the smartphone users are allowed to move between regions.While our Stackelberg equilibrium is stable in the studied static model played by the crowdsourcer and the mobile smartphone users in one time period, a different notation of equilibrium needs to be considered for the dynamic setting.
Therefore, a Nash equilibrium exists in the Followers game.

Appendix 2. Proof of Theorem 3
By Lemma 2, there exists a unique strategy profile that maximizes the utility of each worker given the strategies of the other workers.Thus, if each worker i plays its best response strategy, it will achieve the unique Nash equilibrium point t * i .To prove Theorem 3, we derive t * i by solving ∂u i ∂t i = 0 to obtain: Next, we manipulate the expression in Eq. 23 to obtain: where (i) we express t * i ρ i in terms of the other t * j ρ j values, (ii) we sum up the t * i ρ i values in Eq. 25 for all participating workers in region l i , and (iii) we divide the previous expression by j ∈Q l i t * j ρ j .
Finally, we substitute Eq. 26 into Eq.25 to obtain the unique Nash equilibrium point for each worker i as required.
Consider the region l.Suppose the constraint in Eq. 7 is not satisfied, then the following constraints must be true: |Q l |−2 .Hence, the greedy computation of the set of participating workers Q l from the sorted c i ρ i values in Algorithm 1 does not affect the set of participating workers Q l that achieves the unique Nash equilibrium solution of the Followers game.

Appendix 4. Proof of Theorem 5
To show that the obtained solution from Algorithm 1 is the unique Nash equilibrium (NE) solution of the workers, we proof that the ∂u i ∂t i = 0 (stationary point) condition given in Eq. 24 is satisfied by the set of participating workers i ∈ Q.By Lemma 2, there exists a unique NE in the followers game.Hence, any stationary point is the unique NE point for the We substitute the expressions for t * i and j ∈Q l i t * j ρ j into the ∂u i ∂t i expression in Eq. 24 to obtain the following equality: Since Eq. 28 satisfies the expression for ∂u i ∂t i = 0, we conclude that Algorithm 1 correctly outputs the unique Nash equilibrium solution of the workers in Q l .Also, by Lemma 4, Algorithm 1 also correctly computes the set of participating workers Q l as used in Eq. 28.

Appendix 5. Proof of Lemma 8
From Eq. 11, we have the following expression for all workers i ∈ Q l : (29) From Eq. 29, we set ∂u i ∂c i = 0 to obtain the critical point: (30) We substitute the expression of the critical point derived in Eq. 30 into the ∂ 2 u i ∂(c i ) 2 expression in Eq. 29 to obtain: The denominator term j ∈Q l i c j ρ j in the ∂ 2 u i ∂(c i ) 2 expression from Eq. 29 is positive due to both c i and ρ i being positive.Since we assume that |Q l i | > 2, then we have ∂ 2 u i ∂(c i ) 2 < 0. This leads us to conclude that the critical point is a maximum point.Hence, the proof is complete.

Appendix 6. Proof of Theorem 16
To prove Theorem 16, we make the following two claims.Claim Suppose that the α i , ρ i , and c i values are constant for all workers i ∈ I, the Stackelberg equilibrium of the proposed Stackelberg incentive model is Pareto efficient.
Proof To prove the claim, we use the following key observations: j ∈Q l i u j (t j ; t −j , R l j ) = R l i − j ∈Q l i c t j .
( when the α i , ρ i , and c i values are constant for all workers i ∈ I. From Eq. 32, we know that j ∈Q l i u j (t j ; t −j , R l j ) is inversely proportional to Proof Consider the scenario where there are 2 regions l 1 and l 3 with 2 workers each.Let ρ i = 1, c i = 1 for the workers in the 2 regions.Let l 1 = l 2 and l 3 = l 4 , α 1 = α 2 , α 3 = α 4 , and α 1 < α 3 .Intuitively, given that the worker costs are the same but α 1 < α 3 , then R SE l 1 < R SE l 3 .Suppose we have R l 3 > R SE l 3 and R l 1 < R SE l 1 (recall that the total rewards is bounded, so R l 1 must decrease if R l 3 increases).Let R l 3 = R SE l 3 + and R l 1 = R SE l 1 − where > 0.

Fig. 3
Fig. 3 Illustration of the two coverage metrics used

Fig. 5
Fig. 5 Heatmap of the predictive standard deviation for i disk, and ii the k-depth and proposed Stackelberg incentive models -Scenario (I) i ∈ Q l i , and t * i = 0 otherwise.In addition, we have the expression j

Table 1
Notation c iSensing cost incurred per unit of data by worker i where c i ∈ (0, c].

Table 2
Predictive standard deviation values for Scenario (I).

Table 3
Baseline coverage scores for Scenario (I).

Table 4
Mean squared error values for Scenario (II).diskcoverageareas, which are not applicable to our scenario.To visualize the locations of the participating workers and the predictive variances, we plot the heat map of the predictive variances for the baseline coverage schemes and the proposed Stackelberg incentive model in Fig.5.The two cross labels in the figures represent the two points of interest where no sensor data is available.As the disk coverage scheme only selects one worker per region, it does not collect enough data points compared to the other two schemes.Scenario (II): We list the MSE values of the two baseline coverage schemes and the proposed Stackelberg incentive model for the entire spatial area of interest in Table

Table 5
Baseline coverage scores for Scenario (II).Heatmap of the predictive means in Scenario II.Left: disk; Middle: k-depth; Right: proposed Stackelberg incentive model Consider a strategy profile(R, t) = (R SE , t SE ).If U CS (R; t) > U CS (R SE ; t SE ), then ∃i where u i (t i ; t −i , R l i ) < u i (t SE i ; t −i SE , R SE l i ).It can be shown that the previous statement is true since is a necessary condition for U CS (R; t) > U CS (R SE ; t SE ) Hence, we conclude that ∃i where u i(ti ; t −i , R l i ) < u i (t SE i ; t −i SE , R SE l i ).Similarly, it can be shown that if ∃i where u i (t i ;t −i , R l i ) > u i (t SE i ; t −i SE , R SE l i ) and u i (t i ; t −i , R l i ) ≥ u i (t SE i ; t −i SE , R SE l i ), ∀i ∈ I, then U CS (R; t) < U CS (R SE ; t SE ).This is because if ∃i where u i (t i ; t −i , R l i ) > u i (t SE i ; t −i SE , R SE l i ) and u i (t i ; t −i , R l i )Hence, the proof is complete.Claim The proposed Stackelberg incentive model may not have a Pareto efficient Stackelberg equilibrium.In other words, suppose that (R, t) = (R SE , t SE ) and U CS (R; t) > U CS (R SE ; t SE ), we have u i (t i ; t −i , R l i ) ≥ u i (t SE i ; t −i SE , R SE l i ), ∀i ∈ I.