1 Introduction

Mobile CrowdSensing (MCS) (Capponi et al. 2019; Liu et al. 2019; Belli et al. 2019) is a widely used computational paradigm designed to exploit data provided by the crowd, through their mobile devices, giving rise to a collaborative approach. More precisely, users, referred to as MCS subscribers, actively participate in the crowdsensing platform by installing a mobile app on their smartphones. Through this app, they can respond to specific tasks requested by the platform and receive services as a reward. A task involves activating sensors present in the smartphone, such as GPS, and can operate without direct user intervention. In such cases, the app initiates data sampling from the sensors and sends the collected data to the MCS in accordance with the specified request, continuing until the task concludes or until the user manually stops it. Alternatively, certain tasks may necessitate explicit user engagement, requiring them to physically reach a specified location and perform a designated action, such as taking a picture at that particular place.

Given the widespread usage of smartphones, the potential for MCS to collect user-generated data on a broad and detailed scale, especially in urban areas, is very high (Cardone et al. 2014; Chessa et al. 2017). This is coupled with the advantage of requiring limited investments for the management and maintenance of the crowdsensing platform. Nevertheless, the effective coverage of the environment by such a platform remains constrained by the number of subscribed users along with their mobility. With the term coverage, we refer to the probability that a site is covered within a confidence region using sensed data. As a general observation, dense areas tend to have high coverage due to the presence of many users who can collect data. Conversely, peripheral regions might lack coverage, resulting in no expected data from such places. For these reasons, constructing a coverage map of the monitored environment is a promising approach to identify covered and uncovered regions. Subsequently, this can aid in planning a data-driven CrowdSensing data collection campaign.

In this study, we explore the methodology of constructing a coverage map for a crowdsensing platform by leveraging knowledge of user mobility. From a practical standpoint, the computation of the coverage map can be achieved through both a centralized and a distributed approach. In the former, the crowdsensing platform retrieves all mobility data from participating users, such as GPS traces, irrespective of their engagement in a task or the task’s requirement for mobility data. This approach, which was preliminarily investigated in our prior work (Girolami et al. 2022a), albeit in a different context, namely Edge computing (Girolami et al. 2022b; Bellavista et al. 2019), involves centrally analyzing the collected mobility data to derive the coverage map of the environment. Specifically, the crowdsensing platform designates a region as covered based on the frequency of user visits and the likelihood of users diverting toward that region. The higher the probability of users detouring towards a region, the greater the likelihood of collecting data from that specific area.

The centralized approach, however, raises privacy concerns for users, as it necessitates the Conflict of interest of their mobility data to create a global coverage map. Therefore, in this study, we introduce an alternative distributed approach to construct a coverage map. In this latter approach, the crowdsensing platform only gathers aggregated data from users, comprising user-generated coverage maps. The individual users independently compute a coverage map by analyzing their own trajectories. The coverage maps are then uploaded to the crowdsensing platform, which subsequently generates an aggregated-anonymous map. We analyze both models and provide analytical proof that the two models converge, meaning the aggregate map computed with the distributed approach is identical to the map computed with the centralized model.

We evaluate the performance of both models in an experimental setting utilizing real mobility data from a public dataset. Firstly, we define three experimental scenarios of increasing complexity, demonstrating the equivalence of the centralized and distributed models under similar conditions. Secondly, we examine the performance of the distributed model by varying the two key model parameters. Additionally, we analyze the dataset used for our experiments from a mobility perspective.

The reminder of this work is organized as follows. Section 2 frames the state-of-the-art of coverage computation with crowdsensing scenarios. Section 3 reports our reference scenarios and a first introduction to the centralized and distributed models. Section 4 formally describes the two models and Sect. 5 details our experimental settings.

2 Related work

Network coverage, specifically the sensor set with the maximal residual energy to cover all points-of-interest, is a critical Quality-of-Service parameter, attracting significant research interest in recent years. Beginning with Wireless Sensor Networks (WSN), Chen et al. (2010) proposes sensor deployment strategies to maximize residual energy when covering all points-of-interests (POIs). The work by Senouci et al. (2012) introduces an evidence-based sensor coverage model that addresses deployment-related issues, including sensor reliability. Yang et al. (2019) suggests a model for coverage degree, representing the number of sensors required to cover a POI, specifically for visual WSNs deployed in line-of-sight communication (Yang et al. 2019). Akbarzadeh et al. (2013) incorporates terrain information into visual WSN deployment, employing a probabilistic approach to model binary coverage. The coverage area can be significantly expanded when end-devices operate in Multiple-Input Multiple-Output Space Division Multiplex (MIMO-SDMA) mode (Kocian et al. 2017). However, mobility in WSNs is typically very limited.

The presence of mobile embedded sensors in smartphones and wearable devices enables the monitoring of environmental parameters on a large scale. User mobility, coupled with ubiquitous computing, enhances context awareness and understanding of user behavior compared to network-connected wireless sensor networks, thereby creating shared value. The success of MCS relies on the extensive participation of users. To effectively motivate users in sensing and reporting information, incentive mechanisms are employed (Capponi et al. 2019). Various research works suggest three types of rewards that encourage users to share their data, including location tagging and activity tracking: (i) gamification; (ii) services; or (iii) monetary rewards (Zhang et al. 2016). Challenges in MCS encompass data quality in terms of accuracy, latency, and data security (Zhang et al. 2021). A comprehensive review, focusing on taxonomy, applications, and challenges, can be found in Boubiche et al. (2019). Finally, an MCS campaign may deploy a combination of static and mobile sensors and utilize sensing interpolation strategies to ensure coverage in less frequented areas (Girolami et al. 2017).

Focusing on MCS applications that explore network coverage, it is worth noting the work of Wu et al. (2021), which addresses fine-grained user profiling for personalized task matching. In contrast, Xu et al. (2022) employs a reinforcement-learning approach to maximize rewards in an online participant selection scheme, incorporating both area coverage ratio and degree. Wang et al. (2018) maximizes the future location coverage of the mobile crowd under a guaranteed location privacy protection scheme. The issue of Coverage-Aware Stable Task Assignment in Opportunistic MCS is addressed by Yucel et al. (2021). Finally, we want to highlight the coverage-guaranteed and energy-efficient participant selection strategy proposed by Ko et al. (2019).

Many current Mobile CrowdSensing (MCS) solutions heavily depend on some form of centralized data fusion. However, this approach encounters three issues: (i) to guarantee spatial coverage of the targeted area, the MCS cloud requires location information of the assigned tasks, inevitably revealing participants’ locations; (ii) raw sensor readings can be easily intercepted, leading to the Conflict of interest of sensitive information to adversaries; (iii) in scenarios where Internet bandwidth is low or the geographical environment causes deep channel fades, the number of active connections can overwhelm the network.

To address these challenges, network requirements are often better met by employing distributed or fog computing (Baresi et al. 2016). The burden on back-end cloud servers can be reduced by providing distributed analysis for submitted data at the fog level (Jayaraman et al. 2015). Based on the Quality-of-Service provided by participants, their selection can be offloaded to the mobile edge nodes (Lamaazi et al. 2020). However, these frameworks assume trust among users, fog nodes, or cloud servers, which is not always the case. To address this issue, Lamaazi et al. (2022) proposes adopting a feedback mechanism to ensure cooperation between edge nodes, aiming to eliminate untrustworthy participants. To safeguard data in MCS, recent research has focused on credible and distributed incentive mechanisms based on blockchain technology (Kadadha et al. 2020; Chen et al. 2022). Distributed Mobile CrowdSensing (MCS) has found numerous real-life applications. For instance, the application discussed in Mowafi et al. (2022) utilizes built-in cameras in smartphones to capture images of environmental disasters from various perspectives. Instead of transmitting all images to the server, they are exchanged with other nodes to avoid sharing redundant photos. This approach results in traffic reduction and energy savings exceeding 20% and 25%, respectively, compared to centralized fusion.

In the application highlighted in Paricio and Lopez-Carmona (2019), individual (static) traffic-weighted multimap creation is employed to precisely control vehicle routes. This multimap reflects link costs based on historic real-time data about the network, traffic status, and driver behavior obtained from distributed embedded measurement rootkits.

Another notable application of MCS involves influencing participant behavior. For example, in Ji et al. (2022), tools from the Mental Accounting Theory are deployed to create accounts for task execution profit and bonus, which are then used to motivate workers to alter their original travel schedules. In Guo et al. (2022), distributed MCS, combined with machine learning tools such as Bidirectional Encoder Representations from Transformers (BERT), is proposed for tracking down fake news (Table 1).

Table 1 An overview on recent distributed MCS: location of algorithm deployment and application

To safeguard participant privacy, it is crucial to ensure the uncorrelation of their location data. In line with this approach, Qian et al. (2020) proposes obfuscating participants’ real locations as disturbed locations and achieving optimal task allocation based on these perturbed locations. Differential privacy and distributed user location obfuscation are suggested by Zhou et al. (2022). On the other hand, Shao et al. (2021); Wu et al. (2023) propose using k-anonymity to protect the privacy of datasets. Sparse Mobile CrowdSensing (MCS) with Differential and Distortion Location Privacy is addressed in Wang et al. (2020).

To simultaneously address user data security and system latency, the work presented in Ning et al. (2022) proposes a blockchain-enabled distributed MCS framework for traffic management.

3 System model

In this study, we focus on a typical MCS scenario in which platform users are equipped with a mobile device, typically a smartphone. These devices are capable of performing various sensing tasks through an MCS app. The term task in this context refers to a specific action executed by a device, such as collecting sensing information using on-board sensors or capturing a picture with the embedded camera. Tasks can be autonomously performed by a device, or they may necessitate explicit user intervention. A MCS architecture also includes a back-end server in charge of:

  • sending tasks to users: this action is required to start a data collection campaign. Some noteworthy examples include gathering information about the quality of WiFi networks in a particular area or collecting data on the noise intensity in a specific region;

  • retrieving collected data: this action is necessary for retrieving and storing data collected by users. The data can be stored on the back-end and subsequently analyzed using post-processing analytics.

Typical MCS systems necessitate a mobile application to transfer the collected information from the user’s device to the back-end, similar to navigation, sports tracker, or recommendation apps. The amount of retrievable data and the regions from which data are collected depend on the number of participating users. Specifically, the higher the number of users in a MCS campaign, the greater the expected data and the broader the covered area. Refer to Table 2 for the adopted notation.

For the coverage, peripheral regions not visited by any user are likely to be uncovered, as no sensing information is collected from those areas using the user’s device. In contrast, popular areas are likely to be highly covered, as a significant amount of data is retrieved from user’s device. The concept of data coverage is prevalent in popular MCS applications like Google Maps. These applications are built on the idea of collecting and sharing user contributions, such as real-time traffic conditions, road accidents, or slowdowns.

We now turn our attention to the primary objective of this paper, which is to predict the coverage in an MCS system by leveraging user mobility. We quantify the probability of collecting data from a specific set of locations \(L= \{l_h: h\in [1,H]\}\), \(|L| = H\) given users \(U = \{u_k: k\in [1,K] \}\), \(|U| = K\). Let the coverage \(C^h \in [0,1]\) be defined as the event where at least one user visits location \(l_h \in L\). Moreover, let \(\mathbb {P}(C^h )\) denote the probability of coverage at location \(l_{h}\). The coverage map, designed to indicate the service area of all locations L, corresponds to the 2-dim. probability density function of the site coverage if a user is present in a region around the site. As an example, Fig. 1 illustrates the coverage map for \(H = 1954\) locations assuming that each site is equally likely visited. Each entry in the map is specified by the triple \((l_{h,x},l_{h,y},\frac{1}{H}\mathbb {P}(C^h))\).

Fig. 1
figure 1

Example of coverage map C for L locations, we report on x and y-axes the longitude and latitude and on the z-axis the coverage value for the corresponding location

To achieve this, we propose a mobility-based approach to measure \(C^h\) by identifying highly visited or scarcely visited locations and, in turn, estimating the expected coverage. This approach assumes a positive correlation between a location’s visits and the expected coverage: the more likely a location is visited, the more likely a user will collect data from there.

As anticipated in the introduction, we consider two approaches for the prediction of the coverage that are based on centralized and on a distributed model, respectively. The centralized model requires that the back-end has the full knowledge of the user’s mobility, e.g. the users’ trajectories as GPS coordinates. By analyzing the user’s mobility, it is possible to identify crowded areas and to assume that visited locations are also covered locations. The model also introduces the concept of detour as the possibility for a user to deviate from its original trajectory and to pass close to location \(l_h \in L\). Detouring increases the possibility of collecting data from \(l_h\), and hence of increasing the coverage \(C^h\). MCS typically adopts this approach by providing reward to the participants (Dasari et al. 2020; Hu et al. 2020). We report in Fig. 2 a graphical representation of detour. In the figure we show 2 users: \(u_i, u_j\) and location \(l_h\) as a red box. Users travel along a different trajectory. We can identify for each user’s trajectory the nearest point j with respect to \(l_h\), namely \(d_{i,j}^h, d_{i',j'}^h\). Such points are those from which users will likely accept or decline a detour toward \(l_h\). The advantage of the centralized model is the possibility of building an accurate coverage map, as full details of the user’s mobility are stored on the back-end. Nevertheless, the main drawback is that user’s mobility represents highly personal information, as it can reveal the identity of a subject (Peng et al. 2017). Under this respect, it is worth to mention the strict rules applied with the GDPR framework.

Fig. 2
figure 2

Graphical representation of detour toward location \(l_{h}\)

The second model we study in this work is a distributed and privacy-preserving approach. The idea is to let users locally compute the coverage map for a given set of locations L and storing on the back-end only the coverage map for every user i. Differently from the previous model, the back-end does not require to store the user’s trajectories, rather the back-end only retrieves the coverage map from every user. It is worth to mention that the coverage map does not disclose detailed information about the user’s mobility, rather it only provides the probability for user i of visiting a specific location (locations L are the same for all the users). In turn, the back-end aggregates the retrieved converge maps, building a final map. The 2 proposed models (centralized and distributed) provide the same result: a coverage map for the set L of locations. However, With the centralized model we assume to transfer personal data to the back-end, while with the distributed one we relax this assumption as users locally compute their partial coverage maps and then they transfer this aggregated information to the back-end.

Table 2 The adopted notation for the coverage models

4 On the computation of the coverage map

In this section we formalize the distributed coverage model introduced in Sect. 3, recap the centralized and demonstrate their equivalence.

4.1 Distributed model

To start out, we need to make a few definitions and assumptions. For the sake of simplicity, the coverage area is resembled as circle with radius \(R\) and centered around the respective POI located at \(l_{h} \in L\). Let the random variable \(D_{i,j}^h\) capture the distance between the trajectory \(t_{j}\) of \(u_{i}\) and the POI at location \(l_h\). To model the probability distribution for \(D_{i,j}^h\), consider the following scenario. Suppose that user \(u_{i}\), moving along trajectory \(t_{j}\) chooses to leave the trajectory, to visit location \(l_h\). The shorter the detour distance \(d^{h}_{i,j}\), the more likely the users leave their trajectories. The exponential distribution is commonly used to model this behavior, as preliminary proposed in Girolami et al. (2022a). However, beyond coverage area, the users are too far off the POI. Hence, a better approach is to truncate the distribution at the edge of the coverage area \(R\) according to

$$\begin{aligned} p\left( t\right) = \left\{ \begin{array}{ll} \frac{\lambda \exp {\{- \lambda t\}}}{1- \exp {\{- \lambda R\} }} &{} 0 < t \le R\\ 0 &{} \text {otherwise} \end{array}\right. \end{aligned}$$
(1)

\(\lambda > 0\). Finally, let us define the deterministic variable \(d^{h}_{i,j}\) denoting the minimum distance between trajectory \(t_{j}\) of user \(u_{i}\) and location \(l_h\). With the definitions made above, we are now ready to compute the probability of the event as the tail probability of (1),

$$\begin{aligned} \begin{aligned} \mathbb {P}\left( D_{i,j}^h \ge d^{h}_{i,j} \right)&= \int _{d^{h}_{i,j}}^{\infty } \frac{\lambda \exp {\{- \lambda t\}}}{1- \exp {\{- \lambda R\} }} dt \\&= \frac{\exp {\{- \lambda d^{h}_{i,j} \}}}{1- \exp {\{- \lambda R\} }} \end{aligned} \end{aligned}$$
(2)

Note that the closer the user, the higher the probability to detour. Subsequently, this probability is dubbed Coverage probability, to match the scope of the paper.

In the distributed model, the MCS back-end collects the coverage maps from the individual users instead of their mobility data. This approach limits the amount of sensitive information transferred from users to the back-end. As user \(u_{i}\) does not know the location of \(u_{i'}\), \(i \ne i'\), we start with the probability that user \(u_{i}\) along trajectory \(t_{j}\) visits cite \(l_h\) in (2). Given the users are present in a region around the site \(l_h\), the joint probability that \(l_h\) is not covered by neither of the user’s trajectory is the product of

$$\begin{aligned} \mathbb {P}\left( \overline{C_i^h} \right) = \prod _{t_{j} \in T_i} \left( 1 - \mathbb {P}\left( D_{i,j}^h \ge d^{h}_{i,j} \right) \right) \end{aligned}$$
(3)

when the trajectories of \(u_{i}\) are independent. Therefore, the probability that \(l_h\) is covered by at least one trajectory of \(u_{i}\) has the form

$$\begin{aligned} \mathbb {P}\left( C_i^h \right) = 1 - \prod _{t_{j} \in T_i} \left( 1 - \mathbb {P}\left( D_{i,j}^h \ge d^{h}_{i,j} \right) \right) . \end{aligned}$$
(4)

The coverage map from point-of-view of \(u_{i}\) is the probability of the union of the (mutually exclusive) events located at \(l_h\) and weighted by the density that a user is actually present in a region around the POI:

$$\begin{aligned} p \left( C_i \right) = \sum _{l_{h}} \mathbb {P}\left( C_i^h \right) p\left( l_h \right) . \end{aligned}$$
(5)

With no prior information available, the a-priori density of visited sites, \(p\left( l_h \right)\) is to be assumed uniform on the number of possible locations H. The resulting 2-dim. density has the form of a matrix. The entry in the latitude-th row and longitude-th column corresponds to the coverage probability for the respective location. The pseudo-code of this distributed fusion algorithm in (5) is outlined in Algorithm 1.

figure a

Algorithm 1 Distributed computation of coverage map

4.2 Centralized model

For the centralized model, the MCS back-end collects mobility data from all the users, to extract the coverage map, revealing the users’ identity. We preliminary recap the coverage map of the centralized model that is used to benchmark the distributed model formulated above.

When the trajectories of all users are independent, it can be readily seen from (2) that the joint probability for the location \(l_h\) to be uncovered is given by

$$\begin{aligned} \mathbb {P}\left( \overline{C^h} \right) = \prod _{u_{i}} \prod _{t_{j} \in T_i} \left( 1 - \mathbb {P}\left( D_{i,j}^h \ge d^{h}_{i,j} \right) \right) . \end{aligned}$$
(6)

Ergo, the probability that \(l_h\) is covered at least by one user yields

$$\begin{aligned} \mathbb {P}\left( C^h \right) = 1 - \prod _{u_{i}} \prod _{t_{j} \in T_i} \left( 1 - \mathbb {P}\left( D_{i,j}^h \ge d^{h}_{i,j} \right) \right) . \end{aligned}$$
(7)

The coverage map corresponds to the probability of the union of the events that any site is covered weighted by the density of the users’ presence in the region around the site i.e.,

$$\begin{aligned} p \left( C\right) = \sum _{l_{h}} \mathbb {P}\left( C^h \right) p\left( l_h \right) . \end{aligned}$$
(8)

4.3 Proof of model equality

The two proposed models provide the same outcome, a coverage map for a set of L locations. In this section, we formally proof that the centralized model and the distributed model are equivalent in terms of the obtained results. We start with the distributed model seen from the point-of-view of user \(u_{i}\). When the users roam around independently, the probability that a given POI \(l_h\) is not covered by any user has the form

$$\begin{aligned} \mathbb {P}\left( \bigcap _{u_{i}} \overline{C_i^h} \right) = \prod _{u_{i}} \left( 1- \mathbb {P}\left( C_i^h \right) \right) \end{aligned}$$
(9)

Substituting (3) for (9), it follows that

$$\begin{aligned} \mathbb {P}\left( \bigcap _{u_{i}} \overline{C_i^h} \right) = \prod _{u_{i}} \prod _{t_{j} \in T_i} \left( 1 - \mathbb {P}\left( D_{i,j}^h \ge d^{h}_{i,j} \right) \right) \end{aligned}$$
(10)

The probability that at least one user covers this POI equals to one minus above result, yielding

$$\begin{aligned}{} & {} 1 - \mathbb {P}\left( \bigcap _{u_{i}} \overline{C_i^h} \right) \nonumber \\{} & {} \quad = 1 - \prod _{u_{i}} \prod _{t_{j} \in T_i} \left( 1 - \mathbb {P}\left( D_{i,j}^h \ge d^{h}_{i,j} \right) \right) \nonumber \\{} & {} \quad = \mathbb {P}\left( C^h \right) \end{aligned}$$
(11)

which is equal to the corresponding joint probability for POI \(l_h\) in (7) of the centralized model.

5 Experimental settings

We now detail the experimental settings for the tests executed with the centralized and distributed models. The objective of the experiments described in this section is twofold. On the one hand, we experimentally demonstrate the equivalence of the two proposed models. As demonstrated in Sect. 4.3, the centralized and the distributed models provide the same results, but with different privacy levels. On the other hand, we focus on the distribute model as it guarantees a higher privacy level with respect to the centralized model, thus it represents a valid candidate for a real deployment in MCS systems.

5.1 The experimental mobility dataset

Our experiments are based on a real-world experimental mobility dataset, namely GeoLife (Zheng et al. 2008, 2009). The dataset has been collected by Microsoft Research Asia, and it involves about 182 participants recruited on a volunteer basis, as shown in Fig. 3.

Fig. 3
figure 3

Overview of the GeoLife dataset (Map data copyrighted OpenStreetMap contributors and available from https://www.openstreetmap.org)

The collected data include GPS coordinates of participants obtained with GPS trackers or, in some cases, with the user’s device. The dataset covers an extended period from April 2007 to August 2012. Some of the user’s trajectories are labelled with the adopted transportation mean, e.g., car, bus, metro etc. Data comes from Beijing area; the dataset is publicly available and widely adopted in the current literature.

We show in Fig. 4 how the number of GPS points vary along the time. The graph reports the variation of the number of GPS traces aggregated on a monthly basis as reported in Fig. 5.

Fig. 4
figure 4

Weekly number of GPS points

Fig. 5
figure 5

Number of GPS data aggregated per month

From Fig. 4 we can observe that the time windows starting from late 2008 to early 2010 represents the densest period in terms of available data. For the purpose of this work, we restrict the analysis to a sub-period, and we also restrict the geographic extension of the dataset. In particular, we focus on 2009 and we crop the data to Beijing city,Footnote 1. The inset in Fig. 4, shows the variation of GPS collected data only in May 2009. During this limited period, the dataset provides 982,304 GPS points for a total of 737 distinct trajectories.

5.2 Evaluation metrics

To quantitatively measure the similarity between the centralized and the distribute models (see Sect. 4.3), we empirically generate coverage maps and compare their distributions. There are several statistical tests available in the literature such as the two-sample Anderson-Darlington and the two-sample Kolomogorov-Smirnov (KS) tests. Both tests are non-parametric, and they do not require normality either. We have chosen to rely on the latter option, though, as it is simpler to implement. Subsequently, we briefly review the KS test (Wang and Wang 2010).

To get started, the two empirical coverage maps are transformed to cumulative density functions (CDFs). The KS test accepts two empirical CDFs and returns two parameters, namely the so-called KS statistic and the p value. The former statistic corresponds to the maximum absolute deviation between the two empirical CDFs. To compute the latter value, suppose that the null hypothesis is that the two sample vectors are drawn from populations with same distribution. Then, the p value is the probability that the KS statistics is greater than the observed value under the null hypothesis of no difference between class levels or samples. When the two datasets arise from the same (continuous) distribution, the KS statistics and the p value should converge to 0 and 1, respectively.

5.3 Experimental results

We first assess the equivalence of the two models. To this purpose, we consider three scenarios of increasing complexity and of increasing duration. The goal is to compute the coverage maps resulting from the centralized and distributed models, and to execute the KS test, as detailed in Sect. 5.2. The result of the KS test provides a reliable indication of the model’s equality.

The considered scenarios are all based on data extracted from the GeoLife dataset (see Sect. 5.1, as reported in Fig. 4. The scenarios we consider are the following:

  • Scenario 1: May 1, 2009 to May 7, 2009;

  • Scenario 2: May 1, 2009 to May 15, 2009;

  • Scenario 3: May 1, 2009 to May 31, 2009;

Concerning the locations of interest L, we extract from the Beijing area a collection of places labeled with the following tags: square, monument, mall, station and bus, extended with a set of random points from the city center. As a result, we obtain 1954 locations that are exploited to compute the coverage map with the two models. For optimized visualization, both models are configured with detour radius \(R = 800\) and a scale parameter \(1/\lambda = 50\). Table 3 lists the result of the KS test applied to the three scenarios. In all cases the p values are close to one and hence, the two models are equivalent as anticipated in Sect. 4.

Table 3 Results of the KS test for the three test scenarios

We also show a graphical representation of the coverage maps obtained with the centralized and distributed models. Fig. 6 shows a 3D representation of the models for the 3 scenarios. The graphs report for each location \(l_h \in L\), the corresponding coverage value (z-axis).

Fig. 6
figure 6

3D representation of the coverage map with the centralized and distributed models and for the 3 testing scenarios

Now, our focus shifts to the distributed model, which serves as a privacy-preserving approach for computing coverage. In particular, we examine two orthogonal aspects: the impact of the scale \(1/\lambda\) and the detour radius R on the resulting coverage maps. To achieve this, we vary the scale parameter within the range: [50, 100, 500, 800] and R in the range: [100, 200, 500, 800]. Results have been obtained by considering Scenario 2, ranging from May 1, 2009 to May 15, 2009 but similar results also apply to the other scenarios. The impact of the scale parameter is to modify the truncated exponential distribution reported in Equation 1 and shown in Fig. 7. In particular, the scale parameter affects the dispersion of the exponential distribution that can be used to model the preferential distance at which a user will likely accept a detour toward a location. We observe that, the smaller the scale, the more users will accept a detour only at short distances. Differently, the higher the scale, the more likely users will accept a detour also at high distances from the target location.

Fig. 7
figure 7

Impact of the scale parameter and detour radius to the distribute model

The detour radius R is employed to determine which points along the user’s trajectory are included or excluded, as illustrated in Fig. 7. When R increases, more points can be included in the coverage computation for a given user, while a contraction in R results in a reduction of the included points. The results of the variations in scale and R are depicted in Fig. 8: the rows display the scale variations in the interval 50–800, while the columns show the variations in R from 100 to 800 m. From the figure, we observe two distinct trends. On one hand, increasing the scale does not significantly alter the resulting distributions; they still exhibit the same trend. On the other hand, increasing R has the effect of modifying the distributions. As R increases, the peak of the distribution shifts from left to right, indicating that locations become highly covered with higher values of R.

Fig. 8
figure 8

Analysis of the impact of \(\lambda\) and of R to the distributed model

6 Discussion and conclusions

The adoption of the MCS paradigm has enabled the collection of representative datasets by leveraging sensing units on user devices. The volume of data and the set L of regions from which data are collected are closely tied to user mobility. Typically, regions that are frequently visited are considered covered in terms of the data collected from those areas. In contrast, peripheral regions with low visitation rates are anticipated to be uncovered. However, a crowdsensing platform may incorporate incentive mechanisms to motivate users to detour towards specific regions.

One approach to evaluate the coverage of a MCS system over a set of L locations involves collecting all data related to users’ mobility, such as GPS trajectories, at the centralized server of the MCS. This information is then utilized to identify highly/scarcely visited regions and apply a detour probability. However, this model raises inherent privacy concerns, as it necessitates all users to disclose their personal mobility, potentially serving as a deterrent to user participation in the MCS platform.

In an effort to provide stronger privacy guarantees for users, we propose an alternative distributed model for computing coverage. This distributed model is based on the concept of collecting aggregated and anonymized data to construct the coverage map for the set of L regions. In this case, users only upload their coverage maps without including any information about personal mobility. Subsequently, the obtained coverage maps are merged in the cloud to determine the aggregated map.

To evaluate the performance of the two models, we compare their outputs using a publicly available mobility dataset, namely GeoLife. Specifically, we define three experimental scenarios of increasing complexity and apply the KS test to assess their equivalence. Our tests reveal that the two models perform similarly but with different privacy requirements. Furthermore, we analyze the performance of the distributed model by varying two core settings: the detour radius and the scale parameter.

It should be noted, however, that while the distributed model enhances user privacy, it introduces a potential vulnerability due to malicious or unfaithful users who may inject poisoned location data into the system. The purpose of such actions could be to alter the resulting coverage map and launch attacks on the Mobile CrowdSensing (MCS) platform. This vulnerability exists in both the centralized and distributed models. However, in the centralized model, the server possesses the GPS locations of individual users, providing more opportunities to filter out poisoned data. In contrast, the distributed model involves the exchange of poisoned data with other honest users, potentially affecting the local coverage maps computed by numerous users over time. This is a limitation of the current work.

Therefore, in our ongoing and future work, we aim to delve deeper into understanding the implications of malicious users participating in an MCS initiative. Specifically, we are investigating scenarios where the intentional dissemination of inaccurate coverage maps by malicious users, which do not reflect the actual mobility patterns, could adversely impact the aggregated coverage map computed by the back-end server. In addressing this concern, our plan involves the identification of attacks initiated by users through data poisoning. We intend not only to develop measures for identifying these malicious users but also to implement countermeasures for purging the collected data tainted by false coverage maps. Additionally, future efforts will also focus on leveraging local, short-range communication technologies such as Bluetooth or WiFi to facilitate the exchange and merging of coverage maps among users in proximity.