1 Introduction

With the ubiquitous deployment of wireless networks and mobile devices (e.g., smart phones), spatial crowdsourcing (SC), an emerging paradigm utilizing the distributed mobile devices to monitor diverse phenomena about human activities, has attracted much attention from both academic and industry communities. The main idea of spatial crowdsourcing is recruiting a set of available workers to perform the location-specific tasks by physically traveling to these locations, called task assignment.

Most existing SC researches focus on single task assignment [20, 22], which assumes that tasks are simple and each task can only be assigned to a single worker. For example, Tong et al. [23] design several efficient greedy algorithms to solve the proposed global online micro-task allocation (GOMA) problem in spatial crowdsourcing. [12] considers task assignment and scheduling at the same time, in which an approximate approach is developed that iteratively improves the assignment and scheduling to achieve more completed tasks. However, in real-world scenarios, an individual worker may not be able to perform a complex task (e.g., monitoring the traffic flow in an area or moving heavy stuff) independently since completing the task alone exceeds the capability of this worker. In such scenarios, each task should be assigned to a group of workers, which is named Group Task Assignment.

Group task assignment requires a group of workers to perform each task by physically traveling to the location of this task at a particular time. Some previous studies have explored the group task assignment problem in spatial crowdsourcing. For instance, [13] proposes a team-oriented task planning (TOTP) problem, which makes feasible plans for workers and satisfies the skill requirements of different tasks on workers. Gao et al. [14] develop a Top-k team recommendation framework in spatial crowdsourcing, in which a team leader is appointed among each recommended team of workers in order to coordinate different workers conveniently. Cheng et al. [6] consider the collaboration in task assignment, in which workers are required to cooperate and accomplish the tasks jointly for achieving high total cooperation quality scores. Nevertheless, they fail to effectively incorporate the group preference, which is an essential factor for improving the quality of group task assignment in spatial crowdsourcing as the group members may not be willing to perform the task assigned to them when they are not interested in this task. We next illustrate the group task assignment problem through a motivation example.

Figure 1 shows an example of the group task assignment problem, in which each task is required to be assigned to two workers. There exist five workers \(w_1,\ldots ,w_5\) and two tasks \(s_1, s_2\). Each worker is associated with her current location, her reachable distance range and her movement speed. For the sake of simplicity, we set the movement speed of each worker to 1 in this running example. Each task is labeled with its location where it will be performed. In addition, Fig. 1 also depicts the preferences of different available worker groups for each task. The problem is to assign tasks to suitable worker groups so as to maximize the total task assignments.

Fig. 1
figure 1

Running example

In SC, it is an intuitive move to assign the nearby tasks to workers without violating the spatiotemporal constraints (i.e., the assigned tasks should be located in the reachable ranges of the corresponding workers and workers can arrive in the locations of assigned tasks before the deadlines of tasks). Therefore, we can obtain a task assignment, \(\{{<}s_1,\{w_1,w_2\}{>},{<}s_2,\{w_4,w_5\}{>}\}\), with the overall group preference of 0.33. Nevertheless, when we assign the worker group, \(\{w_4,w_5\}\), to task \(s_2\), the group is likely to quit performing \(s_2\) as they show little interest in \(s_2\) (i.e., the group preference on \(s_2\) is only 0.04), which may leave \(s_2\) uncompleted. If we assign tasks by giving higher priorities to the worker groups who are more interested in the tasks, we can get the task assignment result, \(\{{<}s_1,\{w_2,w_3\}{>},{<}s_2,\{w_1,w_4\}{>}\}\), the total group preference of which is 0.78.

In this paper, we develop a group task assignment framework based on worker groups’ preferences. The framework is comprised of two primary components. First, we utilize the powerful bipartite graph embedding model (BGEM) [31] and the attention mechanism to learn the embedding of task categories and worker groups in a low-dimensional space from group–task interaction data. In order to overcome the limitations of data sparsity problem, we integrate the worker–task interaction data and social network structure information (which is used for extracting the social impact of workers) during the process of preference modeling. Secondly, we apply the tree-decomposition-based algorithm [35] to assign tasks to worker groups to maximize the task assignments by giving higher priorities to the worker groups that show more interest in the tasks.

Although our previous study [17] can exploit social network features by a linear approach, it fails to capture the nonlinear and complex network structures of social network features. Since the underlying network structure is complex and it is necessary to take into consideration the interactions among features in a nonlinear way, we apply an unsupervised deep learning model, called stacked denoising autoencoders (SDAE) [28], to learn the complex interactions among social network features, which is depicted in Sect. 3.3.

The second limitation is that, although our preliminary work [17] has already achieved the optimization goal of maximizing the overall task assignments by taking social impact-based preferences of worker groups into account, it fails to consider the members’ disagreement (i.e., reflecting the level at which group members disagree with each other) among group members. In group activities, it is more desirable to participate in an activity that all the group members with high consensus are interested in than to attend an activity that polarizes group members even if the latter has higher preferences among them [19]. As a group activity, an effective group task assignment tends to ask a group of workers with high agreement to perform a task (such as moving a heavy stuff) simultaneously. Therefore, in Sect. 4.2, we combine the members’ disagreement with the group preference in the phase of group task assignment. More specifically, we calculate a consensus score for each worker group, which includes two aspects. The first aspect is the worker group’s preference (i.e., social impact-based preference), which reflects the degree to which the task is preferred by the worker group members. The second aspect is the group members’ disagreement which reflects the level at which members disagree with each other. The aim of the group task assignment is to maximize the total task assignments by giving priority to the worker groups with higher consensus scores on tasks.

As a summary, the major value-added extension over our preliminary work [17] is as follows:

  • We identify and study in depth two limitations in our previous framework, which include failing to capture the nonlinear interactions among social network features and failing to consider the disagreement factor among group members.

  • We employ the stacked denoising autoencoders (SDAE) method to learn the nonlinear interactions among social network features by exploring their complex inherent structures.

  • We incorporate the consensus score into the group task assignment process, which tries to formalize the members’ disagreement and group preference to weaken the polarization among group members.

  • Extensive experiments are conducted to study the impact of the key parameters and effectiveness of our proposed algorithms. In particular, compared with the original exact task assignment approach, our proposed task assignment method with the deep learning strategy (i.e., stacked denoising autoencoders) and group consensus strategy can improve the task assignment success rate by up to 31.58% in order to enhance the effectiveness of task completion.

The remainder of this paper is organized as follows. The preliminary concepts and framework are introduced in Sect. 2. We then present the preference modeling algorithm in Sect. 3, including the proposed deep learning method in Sect. 3.3. Next, the task assignment algorithm taking the consensus (in Sect. 4.2) into consideration is presented in Sect. 4, followed by the experimental results in Sect. 5. Section 6 surveys the related work, and Sect. 7 concludes this paper.

2 Problem Statement

In this section, we briefly introduce a set of preliminary concepts and then give an overview of our framework. Table 1 summarizes the major notations used in the rest of the paper.

Table 1 Summary of notations

2.1 Preliminary Concepts

Definition 1

(Spatial Task) A spatial task, \(s=<s.l, s.p, s.e, s.c,s.numW>\), is a task to be performed at location s.l, published at time s.p, and will expire at s.e, where s.l : (xy) is a point in the 2D space. Each task s is also labeled with a category s.c (e.g., moving heavy stuff) and s.numW is the number of workers allowed to be assigned to perform s at the same time instance.

Definition 2

(Worker) A worker, \(w=<w.l, w.r, w.on, w.off, w.speed>\), is a carrier of a mobile device who volunteers to perform spatial tasks. A worker can be in an either online or offline mode. A worker is online when she is ready to accept tasks. An online worker is associated with her current location w.l, her movement speed w.speed, her reachable circular range with w.l as the center and w.r as the radius, where w can accept assignment of spatial tasks. Besides, a worker with her online time, w.on, is also associated with her offline time, w.off, before which the worker can be assigned tasks.

In our model, a worker can handle only one task at a certain time instance, which is reasonable in practice. Once the server assigns a task to a worker, the worker is considered being offline until she completes the assigned task.

Definition 3

(Available Worker Set) Given a task s to be assigned and a set of workers in the vicinity of s, the available worker set for task s, denoted as \({\mathrm{AWS}}(s)\), should satisfy the following three conditions: \({\forall }w \in {\mathrm{AWS}}(s)\):

  1. 1.

    \(t_{\mathrm{now}} + t(w.l, s.l) \le s.e\), and

  2. 2.

    \({\mathrm{d}}(w.l, s.l) \le w.r\), and

  3. 3.

    \(t_{\mathrm{now}} + t(w.l, s.l) \le w.off,\)

where \(t_{\mathrm{now}}\) is the current time, \(t(w.l,s.l) = {\mathrm{d}}(w.l, s.l) / w.speed\) is the travel time from w.l to s.l and d(w.ls.l) is the travel distance (e.g., Euclidean distance) between w.l and s.l.

Definition 4

(Available Worker Group) Given a task s to be assigned and its available worker set \({\mathrm{AWS}}(s)\), the available worker group for task s, denoted as \({{\mathrm{AWG}}}(s)\), should satisfy the following three conditions:

  1. 1.

    \( {\mathrm{AWG}}(s)\subset {\mathrm{AWS}}(s)\), and

  2. 2.

    \( \vert {\mathrm{AWG}}(s) \vert = s.numW\), and

  3. 3.

    \({\forall } w_i, w_j \in {\mathrm{AWG}}(s), t_{\mathrm{now}} + t(w_i.l, s.l) \le w_j.off,\)

where \(\vert {\mathrm{AWG}}(s) \vert \) denotes the number of workers in \({{\mathrm{AWG}}}(s)\).

In the rest of the paper, we will use worker group and group interchangeably when the context is clear.

Definition 5

(Spatial Task Assignment) Given a set of workers \(W_i\) and a set of tasks \(S_i\) at time instance \(t_i\), a spatial task assignment, denoted by \(A_i\), consists of a set of \(<task, {\mathrm{AWG}}>\) pairs in the form of \(<s_1,{\mathrm{AWG}}(s_1)>,<s_2,{\mathrm{AWG}}(s_2)>,\cdots \). We use \(\vert A_i \vert \) to denote the number of task assignments.

\(Problem Statement \): Given a set of workers \(W_i\) and a set of tasks \(S_i\) at the current time instance \(t_i\) on a SC platform, the group task assignment (GTA) problem aims to find the optimal assignment with the maximum number of task assignments (i.e., \({\mathrm{max}}\{|A_i|\}\)) by considering the consensus among group members.

2.2 Framework Overview

As shown in Fig. 2, our framework consists of two major components: (1) social impact-based preference modeling (SIPM) for worker groups; and (2) preference-based group task assignment (PGTA) based on worker groups’ preferences.

In the SIPM procedure, inspired by the success of [2, 30] in learning (user) group preference based on both user–item and group–item interaction data, we utilize the bipartite graph embedding model (BGEM) and attention mechanism to obtain each worker group’s preference on different categories of tasks by simultaneously leveraging both worker–task and group–task interaction data. Note that we say a worker interacts with a task if she has performed this task. More specifically, we utilize BGEM to model the individual interaction (i.e., worker–task interaction) and group interaction (i.e., group–task interaction) to learn the vector representation of workers and task categories in a low-dimensional space, respectively. Since the worker groups in spatial crowdsourcing are often formed in an ad hoc manner (called occasional groups) without any interaction with tasks, which means the group interaction data is sparse, we cannot effectively learn the vector representation of groups directly. To solve this problem, we introduce workers’ social impact that represents workers’ weights in a group when making decision about task selection.

Fig. 2
figure 2

Framework of our model

In particular, we integrate the worker–task interaction data with group–task interaction data to construct a social network, based on which we extract the social network information by a deep learning method called stacked denoising autoencoders (SDAE). In order to alleviate the sparsity of group–task interaction data, we employ a joint optimization approach to combine group–task interaction data with worker–task interaction data, in which we can obtain the embedding vectors of workers and task categories as well as workers’ weights (i.e., workers’ social impact). At the same time, the group vector can be calculated by the attention mechanism, which assigns different weights to different workers. Finally, we can obtain the group preference on task categories by taking dot product between group vector and task category vector.

In the PGTA phase, given a set of workers and tasks to be assigned, we first obtain the available worker groups (AWGs) for each task by considering trip constraints, i.e., workers’ reachable range, workers’ available time and tasks’ expiration time. Then, we calculate the consensus scores (including group preference and members’ disagreement) of AWGs. Finally, we employ the optimal task assignment (OTA) algorithm based on tree decomposition to assign tasks to suitable worker groups in order to maximize the total task assignments and giving higher priorities to worker groups with higher consensus scores on tasks.

3 Social Impact-Based Preference Modeling

In this section, we first elaborate how the bipartite graph embedding model (BGEM) [31] learns each worker’s embedding vector (representing her preference on different task categories) and each task category’s embedding vector based on the historical worker–task interaction data (a.k.a. individual interaction data). Then, in the group interaction modeling, we extract workers’ social impact from the social network, in which we utilize a deep learning method (i.e., SDAE), and employ the attention mechanism [2] to adapt the social impact to different worker groups. Finally, we design a joint optimization strategy, which can obtain the preference of each group on task categories by simultaneously leveraging both worker–task and group–task interaction data.

3.1 Individual Interaction Modeling

Given the interactions between workers and tasks, i.e., worker–task interaction data, we first construct a bipartite graph, \({\mathcal {G}}_{WC} = (W\cup C, E_{WC})\), where W denotes the worker set, C denotes all the categories of tasks, \(W\cup C\) is the node set of \({\mathcal {G}}_{WC}\), and \(E_{WC}\) is the set of edges between workers and task categories. An edge \(e_{ij}\) (\(\in E_{WC}\)) exists when worker \(w_i\) (\(\in W\)) has performed the tasks with category \(c_j\) (\(\in C\)). The weight \(h_{ij}\) of edge \(e_{ij}\) is set as \(h_{ij} = \frac{N^{c_j}_{w_i}}{N_{w_i}}\), where \(N^{c_j}_{w_i}\) denotes the number of tasks (with category \(c_j\)) worker \(w_i\) has performed and \(N_{w_i} = \sum _{c\in C}N^{c}_{w_i}\) denotes the total number of tasks \(w_i\) has performed.

Due to the success of BGEM [31] in learning the embedding of heterogeneous interaction entities, we employ it to model the individual worker–task interaction. For the given worker \(w_i\), the probability of \(w_i\) interacting with the tasks with category \(c_j\) can be calculated in the following:

$$\begin{aligned} p(c_j|w_i) = \frac{\exp ({\mathbf {w_i}} \cdot {\mathbf {c_j}})}{\sum _{c \in C} \exp ({\mathbf {w_i}} \cdot {\mathbf {c}})}, \end{aligned}$$
(1)

where \({\mathbf {w_i}}\) is the embedding vector of worker \(w_i\) representing her preference and \({\mathbf {c_j}}\) is the embedding vector of task category \(c_j\).

In the sequel, we define the objective function of the BGEM. As we all know from [18], the target of BGEM is to minimize the KL divergence between \({\hat{p}}(\cdot |w_i)\) and \(p(\cdot |w_i)\), which represent the empirical distribution and the estimated neighbor probability distribution for each worker \(w_i\in W\), respectively.

We employ \(d_i\) to represent the outdegree of worker node \(w_i\), which can be calculated as \(d_i = \sum _{c_j \in C} h_{ij}\) (where \(h_{ij}\) denotes the weight of the edge \(e_{ij}\)). We define the empirical distribution \({\hat{p}}(c_j|w_i) = h_{ij}/d_i\). Thus, the objective function can be obtained as follows:

$$\begin{aligned}&O_{WC} = - \sum _{e_{ij} \in E_{WC}} h_{ij} \log p(c_j|w_i)\nonumber \\&\quad = - \sum _{e_{ij} \in E_{WC}} \frac{N^{c_j}_{w_i}}{N_{w_i}} \log \frac{\exp ({\mathbf {w_i}} \cdot {\mathbf {c_j}})}{\sum _{c \in C} \exp ({\mathbf {w_i}} \cdot {\mathbf {c}})}. \end{aligned}$$
(2)

3.2 Group Interaction Modeling

In the similar way, we construct a bipartite graph, i.e., \({\mathcal {G}}_{GC} = (G \cup C, E_{GC})\), to represent the interactions between groups and task categories, where G is a set of groups, \(G \cup C\) is the node set of \({\mathcal {G}}_{GC}\), and \(E_{GC}\) represents a set of edges between groups and task categories. There exists an edge \(e_{ij}\) (\(\in E_{GC}\)) between group \(g_i\) (\(\in G\)) and task category \(c_j\) (\(\in C\)) if this group of workers has performed the tasks with category \(c_j\) (\(\in C\)). Moreover, the weight \(h_{ij}\) of the edge \(e_{ij}\) is simply set as \(h_{ij} = \frac{N^{c_j}_{g_i}}{N_{g_i}}\), where \(N^{c_j}_{g_i}\) denotes the number of tasks (with category \(c_j\)) worker group \(g_i\) has performed and \(N_{g_i}\) denotes the total number of tasks \(g_i\) has performed. Let \({\mathbf {g_i}}\) be the embedding vector for group \(g_i\) and \({\mathbf {c_j}}\) be the embedding vector for task category \(c_j\). Our target is to obtain an embedding vector for each worker group to estimate the preference on all the task categories.

The objective function in group–task interaction data, which is similar to the worker–task interaction data, can be calculated in the following:

$$\begin{aligned} O_{GC} = - \sum _{e_{ij} \in E_{GC}} h_{ij} \log p(c_j|g_i) = - \sum _{e_{ij} \in E_{GC}} \frac{N^{c_j}_{g_i}}{N_{g_i}} \log \frac{\exp ({\mathbf {g_i}} \cdot {\mathbf {c_j}})}{\sum _{c \in C} \exp ({\mathbf {g_i}} \cdot {\mathbf {c}})}. \end{aligned}$$
(3)

Nevertheless, in reality, there are few persistent groups, while there are large amounts of occasional groups forming in an ad hoc manner to perform a task in spatial crowdsourcing. As a result, the group–task interaction data are over sparse with the cold-start nature (i.e., there is no or little group–task interaction) of occasional groups, which leads it difficult to directly learn the embedding vector of an occasional group. To tackle the sparsity and cold-start issue, we aggregate the embeddings of all the members in a group from the group–task interaction data. We observe that in decisions such as task selection, some group members may outspeak others in expressing their preference (due to prestige, authority, or other personality factors) and thus are more influential on the group’s choice on tasks. In addition, the same worker in different groups may have different contributions on group’s decision-making. Therefore, we introduce a coefficient \(\alpha (k,i)\) to learn the weight of worker \(w_k\) in group \(g_i\), which represents the group-aware personal social impact of \(w_k\) in deciding the choice of group \(g_i\) on tasks. Specifically, given an occasional group \(g_i\), we define the embedding vector \({\mathbf {g_i}}\) as follows:

$$\begin{aligned} {\mathbf {g_i}} = \sum _{w_k \in g_i} \alpha (k,i) {\mathbf {w_k}}, \end{aligned}$$
(4)

where \(\alpha (k,i)\) is a learnable parameter (where a higher value indicates greater impact on a group’s decision) and \({\mathbf {w_k}}\) denotes the embedding of worker \(w_k\).

However, occasional groups temporarily gather together to perform a task in a time instance. It is difficult to learn the coefficient \(\alpha (k,i)\) directly from the group–task interaction data because of the extreme data sparsity problem. Therefore, we introduce an additional positive numerical value \(\lambda _k\) for each worker \(w_k\) representing the global personal social impact, which does not depend on specific groups. We employ \(\exp (\lambda _k)\) to represent the relative impact on deciding a group choice on tasks. Thus, \(\alpha (k,i)\) can be calculated in Eq. 5, which is inspired by the attention mechanism [2].

$$\begin{aligned} \alpha (k,i) = \frac{\exp (\lambda _k)}{\sum _{w_k \in g_i} \exp (\lambda _k)}. \end{aligned}$$
(5)

It is obvious that once we obtain the \(\lambda _k\) representing the global personal social impact for each worker \(w_k\), we can easily obtain the \(\alpha (k,i)\), which represents the group-aware personal social impact in a group. However, if a worker has only participated in very few group activities, it may suffer from over-fitting problems. Moreover, if a worker has never attended any group activities, we are not capable of learning the global personal social impact. As a result, we cannot learn the satisfying social impact only from the group–task interaction data.

In order to improve the accuracy of global personal social impact estimation, we construct a workers’ social network based on both worker–task and group–task interaction data, based on which we extract the social network information, which benefits workers’ global social impact estimation. In the social network, each worker maps to a node and an edge exists if two workers have cooperated with each other in the same group. The weight of the edge is set as the number of cooperations between the workers. Each worker (node) is associated with the number of tasks she has completed. Then, we extract the social network structure information by various measures (e.g., degree centrality and betweenness centrality) and integrate the social network structure information into the learning process of worker’s global social impact, which effectively alleviates the cold-start problem in group–task interaction data.

In particular, we can calculate a social network feature vector \({\mathbf {\beta _k}}\) for worker \(w_k\) and employ a feature selector vector \({\mathbf {h}}\) to assign different weights to different structure features [30]. We normalize all the feature values into the range [0,1]. Then, we take dot product between the social network feature vector \({\mathbf {\beta _k}}\) and the feature selector vector \({\mathbf {h}}\) as the Gaussian prior for the global personal social impact of worker, i.e., \(\lambda _k \sim ({\mathbf {\beta _k}} \cdot {\mathbf {h}} + b, \rho _V^2)\) (b is a bias term). Due to the fact that global personal social impact may be affected by other unknown factors, we assume that \(\lambda _k\) follows the normal distribution with the mean \({\mathbf {\beta _k}} \cdot {\mathbf {h}} + b\) to learn the more robust personal global social impact.

In terms of the objective function, we should add a corresponding regularization term \(R_V\), i.e., \(\frac{1}{2\rho _V^2} \sum _{w_k \in W}(\lambda _k - ({\mathbf {\beta _k}} \cdot {\mathbf {h}} + b))^2\), into the objective function since we introduce a Gaussian prior for the personal social impact parameter \(\lambda _k\). The hyper-parameter \(\rho _V^2\) (i.e., variance) can control the weight of the regularization term. Therefore, the new objective function is as follows:

$$\begin{aligned} O_{\mathrm{VGC}} = O_{\mathrm{GC}} + R_V. \end{aligned}$$
(6)

Considering the cold-start issue in group–task interaction data, we combine worker–task interaction data with group–task interaction data during the optimization process. More specifically, we design a joint optimization approach, which can simultaneously learn the embedding vectors of workers and task categories from the worker–task interaction data and group–task interaction data. Besides, the global social impact of workers can be learned during the optimization process. Therefore, we combine \(O_{\mathrm{VGC}}\) and \(O_{\mathrm{WC}}\) to form a joint objective function, which is simply defined as follows:

$$\begin{aligned} O_{\mathrm{GWC}} = O_{\mathrm{VGC}} + O_{\mathrm{WC}}. \end{aligned}$$
(7)

Here, we adopt the standard stochastic gradient descent (SGD) strategy [4] to minimize the objective function \(O_{\mathrm{GWC}}\) in Eq. 7, as a result of which each worker’s embedding vector \({\mathbf {w}}\), each task category’s embedding vector \({\mathbf {c}}\) and the model parameters (i.e., \(\lambda _k, {\mathbf {h}}\)) can be learned. We can calculate the coefficient \(\alpha (k,i)\) representing the group-aware personal social impact according to Eq. 5. Then, each group’s embedding vector \({\mathbf {g}}\) can be correspondingly obtained based on Eq. 4. Finally, we take dot product between each group’s embedding vector and each task category’s embedding vector to achieve the preference of each group on each task category.

3.3 Deep Social Impact Learning

Since the underlying network structures are complex, shallow models cannot capture the highly nonlinear network structures. In order to tackle this problem, we adopt the stacked denoising autoencoders (SDAE) [28] model to learn the latent representation of social network features for worker \(w_i\), denoted as \({\mathbf {x_i}}\), from the original social network feature vector \({\mathbf {\beta _i}}\). This is inspired by the success of deep learning, which has a powerful representation ability to learn complex structures of data [3]. As an unsupervised neural network model of codings, SDAE aims to learn the representation of the corrupted input data to predict the clean input itself in the output. SDAE consists of two parts, the encoder and the decoder, which contains multiple nonlinear functions (i.e., layers) for mapping the corrupted input to a latent representation space and then maps the representation into the representation space to reconstruct the original clean input. The composition of multiple layers of nonlinear functions can map the data into a highly nonlinear space, thereby being able to capture the highly nonlinear network structures.

More specifically, we take the initial social network features of all workers (denoted as \({\mathbf{X}}_c\)) as input and then corrupt the initial input to get a partially destroyed version (denoted as \({\mathbf{X}}_0\)) by randomly choosing some elements of \({\mathbf{X}}_c\) to be forced to 0. Given the corrupted input \({\mathbf{X}}_0\), the output of each layer k, denoted as \({\mathbf{X}}_k\), can be generated by the means of \({\mathbf{X}}_k \sim {\mathcal {N}}(\sigma ({\mathbf{X}}_{k-1} {\mathbf{W}}_k + {\mathbf{b}}_k), \rho _X^2 {\mathbf{I} })\), where \({\mathbf{I} }\) denotes the identity matrix and \(\rho _X^2\) is a regularization hyperparameter. The weight and bias parameters of layer k can be represented as \({\mathbf{W }}_k\) (i.e., \({\mathbf{W} }_k \sim {\mathcal {N}}({\mathbf{0} }, \rho _W^2 {\mathbf{I} })\)) and \({\mathbf{b} }_k\) (i.e., \({\mathbf{b} }_k \sim {\mathcal {N}}({\mathbf{0} }, \rho _b^2 {\mathbf{I} })\)), respectively, where \(\rho _W^2\) and \(\rho _b^2\) are regularization hyperparameters. Following the process above, the optimization function of SDAE can be defined as follows:

$$\begin{aligned}O_{\mathrm{SDAE}} &= \frac{1}{2} \sum _{k} \left( \frac{\left\| \sigma ({\mathbf{X} }_{k-1} {\mathbf{W }}_k + {\mathbf{b} }_k)-{\mathbf{X} }_{k}\right\| _2^2}{\rho _X^2} + \frac{\left\| {\mathbf{W} }_k\right\| _2^2}{\rho _W^2} \right. \nonumber \\&\left. \quad + \frac{\left\| {\mathbf{b} }_k\right\| _2^2}{\rho _b^2}\right) , \end{aligned}$$
(8)

where \(||\cdot ||_2^2\) denotes the Euclidean norm and \(\sigma (\cdot )\) is a sigmoid function. The output of the hidden layer in the middle is the latent representation of the social network features. Then, we can take dot product between the latent vector \({\mathbf {x_i}}\) and the feature selector vector \({\mathbf {h}}\) as the Gaussian prior for the global personal social impact of worker \(w_i\), i.e., \(\lambda _i \sim {\mathcal {N}}({\mathbf {x_i}} \cdot {\mathbf {h}} + b, \rho _V^2)\), where \({\mathbf {x_i}}\) is the ith row of the latent representation of the social network features, b is a bias term, \(\rho _V^2\) is a hyperparameter and \(\lambda _i\) denotes the global personal social impact.

Finally, we combine the SDAE model with our proposed group interaction learning framework and the optimization function in Eq. 6 can be converted into:

$$\begin{aligned} O_{\mathrm{VGC}} = O_{\mathrm{GC}} + O_{\mathrm{SDAE}} + R_V, \end{aligned}$$
(9)

where \(O_{\mathrm{GC}}\) denotes the objective function in group–task interaction data and \(R_V\) is a regularization term. Then, we can employ the original method (i.e., SGD) to optimize the objective function in Eq. 7.

4 Group Task Assignment

In this section, we first generate the available worker groups for each task based on the trip constraints (i.e., workers’ reachable range, workers’ available time and tasks’ expiration time). Then, we calculate the consensus scores of available worker groups, and finally a tree-decomposition-based algorithm [32, 35] is employed to achieve the optimal task assignment.

4.1 Available Worker Group Set Generation

4.1.1 Finding the Reachable Workers for Each Task

Due to the constraint of workers’ reachable distance, workers’ available time and tasks’ expiration time, each task can be completed by a small subset of workers in a time instance. Therefore, we firstly find the set of workers that can complete each task without violating the constraints. The reachable worker subset for a task s, denoted as \({\mathrm{RW}}_s\), should satisfy the following conditions: \({\forall }w \in {\mathrm{RW}}_s\):

  1. 1.

    \(t_{\mathrm{now}} + t(w.l, s.l) \le s.e\), and

  2. 2.

    \({\mathrm{d}}(w.l, s.l) \le w.r\), and

  3. 3.

    \(t_{\mathrm{now}} + t(w.l, s.l) \le w.off,\)

where \(t_{\mathrm{now}}\) denotes the current time, t(w.ls.l) is the travel time from w.l to s.l and d(w.ls.l) denotes the travel distance (e.g., Euclidean distance) between w.l and s.l. The above three conditions guarantee that a worker w can travel from her location w.l to a task s (which is located in her reachable range) directly before task s expires and during worker w’s available time.

4.1.2 Finding the Available Worker Group Sets for Each Task

Given the reachable workers for each task s, we next find the set of available worker group, denoted as \({\mathbb {AWG}}(s)\), under the constraints of workers’ available time in a group and the number of workers allowed to be assigned to perform a task s. Each available worker group in \({\mathbb {AWG}}(s)\), denoted as \({\mathrm{AWG}}(s)\), should satisfy the following conditions:

  1. 1.

    \(\vert {\mathrm{AWG}}(s) \vert = s.numW\), and

  2. 2.

    \({\forall } w_j, w_k \in {\mathrm{AWG}}(s), t_{\mathrm{now}} + t(w_j.l, s.l) \le w_k.off,\)

where \(\vert {\mathrm{AWG}}(s) \vert \) is the number of workers in \({{\mathrm{AWG}}}(s)\). The above two conditions guarantee that workers in a group can arrive at the location of task s without violating the available time of each other.

4.2 Consensus Calculation

The group task assignment aims to assign each task to a group of workers that show interests and preferences on the task. However, group members may not always share the same tastes. Therefore, the group task assignment process should manage the heterogeneity of worker groups. In this part, we design a consensus score for each available worker group, which contains two main aspects: group preference and group members’ disagreement. In particular, the group preference reflects the degree to which the task is preferred by all the members. The more the group members prefer a task, the higher its consensus score should be for the group. The group members’ disagreement reflects the level at which members disagree with each other. Group members who disagree with each other are not willing to conduct a task together, leading a low consensus score of the group.

We next calculate the consensus score for each available worker group that consists of two components, i.e., group preference and group members’ disagreement. More specifically, the group preference is the social impact-based preference, which is given in Sect. 3. The disagreement of a group g over a task category c, denoted as \({\mathrm{dis}}(g, c)\), is the deviation of individual worker preference from her group’s average preference. Therefore, the consensus score of g for task category c, denoted by \({\mathrm{con}}(g, c)\), can be computed by the following formula:

$$\begin{aligned} {\mathrm{con}}(g, c)= & \, {\mathbb {P}}(g, c) \cdot \eta (g, c) , \end{aligned}$$
(10)
$$\begin{aligned} {\mathbb {P}}(g, c)= & \, {\mathbf {g}} \cdot {\mathbf {c}}, \end{aligned}$$
(11)
$$\begin{aligned} \eta (g, c)= & \, 1 - \min \{ 1, {\mathrm{dis}}(g,c) \}, \end{aligned}$$
(12)
$$\begin{aligned} {\mathrm{dis}}(g,c)= & \, \frac{1}{\vert g \vert } \sum _{w \in g}({\mathbb {P}}_{\mathrm{indiv}}(w, c) - {\mathbb {P}}_{\mathrm{mean}}(g, c))^2, \end{aligned}$$
(13)
$$\begin{aligned} {\mathbb {P}}_{\mathrm{indiv}}(w, c)= & \, {\mathbf {w}} \cdot {\mathbf {c}}, \end{aligned}$$
(14)
$$\begin{aligned} {\mathbb {P}}_{\mathrm{mean}}(g, c)= & \, \frac{1}{\vert g \vert } {\sum _{w\in g}{\mathbb {P}}_{\mathrm{indiv}}(w, c)}, \end{aligned}$$
(15)

where \({\mathbb {P}}(g, c)\) denotes the preference of group g for task category c by taking the dot product between group vector \({\mathbf {g}}\) and task category vector \({\mathbf {c}}\) (shown in Eq. 11). \(\eta (g, c)\) is a function calculating the discount to the group’s preference on the basis of group members’ disagreement, and \({\mathrm{dis}}(g,c)\) is the disagreement among group members that represents the deviation of group members’ preferences from the group’s average preference. \({\mathbb {P}}_{\mathrm{indiv}}(w, c)\) denotes the preference of individual worker w for task category c, which can be computed by taking the dot product between worker vector \({\mathbf {w}}\) and task category \({\mathbf {c}}\) (shown in Eq. 14) and \({\mathbb {P}}_{\mathrm{mean}}(g, c)\) denotes the mean of the preferences of all the members in group g for task category c. \({\mathbf {g}}\), \({\mathbf {c}}\) and \({\mathbf {w}}\) are obtained in Sect. 3. Note that we normalize the preference values (e.g., \({\mathbb {P}}(g, c)\) and \({\mathbb {P}}_{\mathrm{indiv}}(w, c)\)) to lie between 0 and 1, using a Min–Max normalization procedure.

4.3 Optimal Group Task Assignment Algorithm

It is easy to know that the global optimal result is the union of one possible available worker group (AWG) of all tasks. We introduce an algorithm, i.e., tree-decomposition-based strategy [32, 35], to achieve the optimal task assignment with the maximal task assignments, in which we give higher priority to the worker groups with higher consensus scores on tasks. More specifically, we first construct a task dependency graph, G(VE), according to the dependency relationship among tasks. (Two tasks are dependent with each other if they share the available workers; otherwise, they are independent.) We consider that each vertex \(v \in V\) represents a task \(s_v \in S\). There exists an edge \(e(u, v) \in E\) between u and v if two tasks \(s_u\) and \(s_v\) are dependent with each other. Subsequently, we utilize a tree decomposition strategy to separate all tasks into independent clusters, which are the maximal cliques of the task dependency graph. Then, we utilize a recursive tree construction (RTC) algorithm [32, 35] to organize them into a balance tree structure, such that the tasks in sibling nodes of the tree do not share the same available workers. Facilitated by such a tree structure, we can solve the optimal assignment subproblem on each sibling node independently. Finally, the optimal assignment result can be found by a depth-first search through the tree, during which we assign tasks to the available worker groups with higher consensus scores on tasks.

5 Experiment

In this section, we conduct extensive experiments on a real-world dataset to evaluate the performance of our proposed algorithms. All the algorithms are implemented on an Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20 GHz with 256 GB RAM.

5.1 Experiment Setup

We conduct our experiments on a check-in dataset from Twitter, which provides check-in data across USA except Hawaii and Alaska from September 2010 to January 2011 including locations of 62,462 venues and 61,412 users. The dataset is used widely in evaluation of SC platform [11]. Due to the lack of category information of venues in dataset, we generate the category information (i.e., task category information) associated with each venue from Foursquare with the aid of its API. When using the dataset in our experimental research, we assume that the users in dataset are the workers of SC platform since users who check in to different spots may be good candidates to perform spatial tasks in the vicinity of those spots, and their locations are those of the most recent check-in points. We assume the spots are the tasks of SC platform and employ its location and earliest check-in time of the day as the location and publish time of a task, respectively. We consider each worker’s average travel distance as the worker’s movement speed. We extract 20 kinds of check-in categories to simulate the task categories, i.e., the categories of check-ins. Checking in a spot is equivalent to accepting a task.

As Twitter does not contain explicit group information, we extract implicit group task completion activities as follows: We assume if a set of users visit the same spot or different spots with the same category which are near to each other (e.g., the distance between any two spots is less than 10km in our experiments) in one hour, they are regarded as the members of a group.

The default values of all parameters used in our experiments are summarized in Table 2.

Table 2 Experiment parameters

5.2 Experimental Results

5.2.1 Performance of Preference Modeling

In this experiment, we evaluate the efficiency of worker groups’ preference modeling phase and its impact to the subsequent task assignment. Specifically, we compare the efficiency (i.e., CPU time) of the following algorithms:

  1. 1.

    AVG: Average preference calculation (AVG) method, where the average preference of a group g is computed by \(\frac{N^{c}_{g}}{N_{g}}\). \(N^{c}_{g}\) denotes the number of tasks (with category c) worker group g has performed and \(N_{g}\) denotes the total number of tasks g has performed.

  2. 2.

    SIP: the social impact-based preference (SIP) modeling algorithm by employing the linear approach for exploiting social network features.

  3. 3.

    SIP+SDAE: the social impact-based preference (SIP) modeling algorithm by employing stacked denoising autoencoders (SDAE).

For effectiveness of task assignment based on the above preference modeling methods, we compare the assignment success rate (ASR) by applying the optimal group task assignment (OGTA) algorithm. ASR is the ratio of successful assignments to the total assignments for all workers in a time instance. Note that once all the group members actually perform (check in) the tasks (spots) with the same category which are near to each other (e.g., the distance between the tasks is less than 10 km in our experiments) in one hour, we regard this task assignment as a successful assignment.

Fig. 3
figure 3

Performance of preference modeling: effect of \(e-p\)

Effect of \(e-p\). First, we investigate how the valid time of tasks, \(e-p\), affects the efficiency of preference modeling and the effectiveness of group task assignment. As we can see from Fig. 3a, the CPU time of all the methods increases gradually when \(e-p\) is longer, as there are more workers and tasks to be processed. SIP+SDAE and SIP have similar performance since both of them calculate the group preference by taking the dot product process between group vector and task category vector (that can be obtained by the SDAE method and the linear method in the training phase). Although AVG is the least-consuming, it performs worst in terms of ASR (Fig. 3b). In Fig. 3b, the optimal group task assignment algorithm based on SIP+SDAE achieves the highest accuracy, followed by SIP and AVG. Compared with SIP, the assignment accuracy of SIP+SDAE is higher, which confirms the superiority of taking SDAE into consideration. This is due to the fact that SDAE can better capture the nonlinear interactions among social network features to improve the estimation of workers’ global personal social impact.

Fig. 4
figure 4

Performance of preference modeling: effect of off–on

Effect of off–on. Figure 4 illustrates the effect of off–on on the performance of all algorithms. As expected in Fig. 4a, increasing workers’ available time will incur more CPU time for all the algorithms. This may be due to the fact that, when the available time is more relaxed, more worker groups will be generated, which means we have to compute the preferences for more worker groups. The ASR values of all the methods are enhanced with the increasing off–on, which is depicted in Fig. 4b, since a worker group has more chance to be assigned their interested tasks when off–on grows. SIP+SDAE performs better than SIP in terms of ASR, confirming the advantage of using SDAE during preference modeling. The reason behind it is that the workers’ personal social impact learned by the deep learning model is more accurate.

Fig. 5
figure 5

Performance of preference modeling: effect of r

Effect of r. Next, we study the effect of workers’ reachable range r. The CPU time generated by all approaches has a growing tendency as r being enlarged (Fig. 5a), with the similar reason of the effect of tasks’ valid time, i.e., the larger the workers’ reachable regions are, the more available worker groups for each task need to be processed. From Fig. 5b, the ASR of all the approaches increases with the enlarged r and SIP+SDAE is better than other methods, since group members have more chance to be assigned their interested tasks when r is larger.

Fig. 6
figure 6

Performance of preference modeling: effect of numW

Effect of numW. We study the effect of numW in this set of experiments. As demonstrated in Fig. 6a, it is worth noting that the CPU time is gradually decreasing as the number of each group being enlarged, since the number of the available worker groups for each task decreases when numW gets larger. The ASR of all methods shows a decreasing trend (Fig. 6b), the reason behind which is that we cannot assign the tasks to the suitable groups because of the less available worker groups. However, the optimal group task assignment method based on SIP+SDAE still shows a higher superiority than other methods, which confirms the effectiveness of our proposed methods.

Fig. 7
figure 7

Performance of preference modeling: effect of |S|

Effect of |S|. In this part of experiments, we evaluate the scalability of all the approaches by varying the number |S| of tasks from 1 to 5 k. As depicted in Fig. 7a, all the methods become time-consuming when |S| increases since more tasks and available worker groups need to be processed. The optimal group task assignment method based on SIP+SDAE still outperforms other methods (i.e., SIP and AVG) in ASR, which is shown in Fig. 7b.

Fig. 8
figure 8

Performance of preference modeling: effect of |W|

Effect of |W|. Finally, we measure the performance of our methods by expanding the number of workers (|W|) from 1 to 5 k. We can see from Fig. 8a naturally, the running time of all methods increases when the number of workers gets larger. The main reason behind it is that the number of available workers to be assigned grows when |W| gets larger, which in turn leads to longer time cost. The assignment success accuracy of all algorithms shows an increasing trend, which is shown in Fig. 8b. SIP+SDAE has the highest assignment success rate, demonstrating the superiority of our proposed method in modeling worker groups’ preference.

5.2.2 Performance of Group Task Assignment

In this part, we evaluate the efficiency and effectiveness of the group task assignment approaches in terms of CPU time, assignment success rate (ASR) and the overall number of task assignments. Specially, the CPU time is given by the average time cost of performing task assignment at each time instance, the ASR and number of assigned tasks can measure the quality of task assignment strategies. We compare and evaluate the performance of the following methods:

  1. 1.

    OGTA: optimal group task assignment (OGTA) algorithm based on tree decomposition algorithm without considering worker groups’ preference.

  2. 2.

    SIP+SDAE-OGTA: OGTA algorithm with worker groups’ social impact-based preference calculated by the SIP+SDAE method.

  3. 3.

    SIP+SDAE+DIS-GGTA: greedy group task assignment (GGTA) algorithm with both worker groups’ social impact-based preference (calculated by the SIP+SDAE method) and group members’ disagreement (DIS). A basic greedy task assignment algorithm is introduced to assign each task greedily to the worker groups with the maximal consensus until all the tasks are assigned or all the worker groups are exhausted.

  4. 4.

    SIP+SDAE+DIS-OGTA: OGTA algorithm with both worker groups’ social impact-based preference (calculated by the SIP+SDAE method) and group members’ disagreement (DIS).

Effect of \(e-p\). We first study the effect of the valid time \(e-p\) of tasks. As illustrated in Fig. 9a, longer expiration time means on average each worker group has more freedom to choose tasks, which results in greater search space. SIP+SDAE+DIS-OGTA runs slower than other methods as it must compute the consensus score of worker group for the reachable tasks during the task assignment. On the other hand, as shown in Fig. 9b, the accuracy of all algorithms except OGTA has an increasing trend when \(e-p\) grows longer. This is due to the fact that a worker group has more chance to be assigned their interested tasks with the growing valid time of tasks. OGTA keeps almost constant as it does not consider worker group’ preference. SIP+SDAE+DIS-OGTA and SIP+SDAE+DIS-GGTA outperform the SIP+SDAE-OGTA algorithm for all values of \(e-p\) in terms of ASR, which indicates that considering the consensus score is beneficial to assignment accuracy. The reason behind it is that it is more desirable to assign a task that each worker group member is interested in than to assign a task that polarizes group members. Although SIP+SDAE+DIS-GGTA is fastest among all the methods and has the similar ASR with SIP+SDAE+DIS-OGTA, it assigns less tasks compared with other methods (i.e., OGTA, SIP+SDAE+DIS-OGTA, SIP+SDAE-OGTA), as shown in Fig. 9c.

Fig. 9
figure 9

Performance of group task assignment: effect of \(e-p\)

Fig. 10
figure 10

Performance of group task assignment: effect of off–on

Effect of off–on. In this set of experiments, we evaluate the effect of the workers’ available time. Obviously, from Fig. 10a, the running time of all algorithms increases with the longer workers’ available time, since there are more available worker groups for each task to be searched. SIP+SDAE+DIS-OGTA is more time-consuming than other algorithms. The ASR of SIP+SDAE+DIS-OGTA and SIP+SDAE+DIS-GGTA methods consistently outperform other methods, and SIP+SDAE+DIS-OGTA is slightly higher than SIP+SDAE+DIS-GGTA (Fig. 10b). The ASR of all the methods except OGTA has a similar tendency with \(e-p\) when off–on grows, with the similar reason that worker groups have more chance to obtain their interested tasks with the increasing available time off–on. From Fig. 10c, we can see that the number of task assignments grows quickly when workers’ available time is longer, the reason behind which is that there are more available worker groups for each task as workers’ available time gets longer.

Fig. 11
figure 11

Performance of group task assignment: effect of r

Effect of r. Next, we evaluate the effect of r, the range of workers’ reachable radius. Not surprisingly, as we can see in Fig. 11a, the running time of all methods increases when r grows, while the growth of computational cost for SIP+SDAE+DIS-OGTA is faster. The ASR of SIP+SDAE+DIS-OGTA and SIP+SDAE+DIS-GGTA is higher than SIP+SDAE-OGTA and OGTA methods, which confirms the superiority of considering the consensus score for each worker group (Fig. 11b). Although SIP+SDAE+DIS-OGTA and SIP+SDAE+DIS-GGTA show a similar performance in ASR, SIP+SDAE+DIS-OGTA assigns more tasks than SIP+SDAE+DIS-GGTA, which is depicted in Fig. 11c. The number of task assignments generated by all approaches has a growing tendency as r being enlarged, since the larger the workers’ reachable regions are, the more chance the SC server can assign more tasks to the suitable worker groups.

Fig. 12
figure 12

Performance of group task assignment: effect of numW

Effect of numW. Figure 12 shows the effect of numW. Figure 12a illustrates the CPU time of different methods. As expected, the running time shows a downward tendency when the number of workers of each group (i.e., numW) gets larger. It is due to the fact that there are less available worker groups for each task as numW gets larger, which reduces the search space. SIP+SDAE+DIS-OGTA is still the most time-consuming method. The assignment success rate of all algorithms declines with numW, as shown in Fig. 12b. However, the method based on SIP+SDAE+DIS still shows a higher superiority than other algorithms, demonstrating the advantage of taking group members’ disagreement into account. In addition, Fig. 12c demonstrates that the number of task assignments of SIP+SDAE+DIS-GGTA has no advantage compared with other methods.

Fig. 13
figure 13

Performance of group task assignment: effect of |S|

Effect of |S|. In this set of experiments, we evaluate the scalability of all the proposed algorithms by changing the number |S| of tasks from 1 to 5 k. As expected, although the CPU time increases as |S| increases, SIP+SDAE+DIS-OGTA performs well in improving the assignment success rate and the number of task assignments, which is demonstrated in Fig. 13b, c. Figure 13a indicates that SIP+SDAE+DIS-GGTA is the most efficient algorithm, while other algorithms based on OGTA run much more slower, which is mainly due to the extra time cost for building the tree to be searched and searching the tree during the OGTA procedure. In terms of assignment success rate, the accuracy of SIP+SDAE+DIS-OGTA is a bit higher than SIP+SDAE+DIS-GGTA and SIP+SDAE-OGTA, while OGTA still keeps almost constant as |S| grows, which is shown in Fig. 13b. Similar to the previous results, the OGTA-related algorithms (i.e., SIP+SDAE+DIS-OGTA, SIP+SDAE-OGTA, OGTA) outperform the SIP+SDAE+DIS-GGTA method for all the values of |S| in the number of task assignments, which is depicted in Fig. 13c.

Fig. 14
figure 14

Performance of group task assignment: effect of |W|

Effect of |W|. In our final set of experiments, we investigate how the number of workers affects the efficiency and effectiveness of group task assignment. As Fig. 14a shows, the CPU time increases with a larger number of workers with the similar reason of the effect of |S|, i.e., the extra time cost for building the tree and searching the tree during the OGTA procedure with more workers. The optimal group task assignment algorithm considering both worker group’s preference and group members’ disagreement (SIP+SDAE+DIS-OGTA) performs best in assignment success rate, which is shown in Fig. 14b. In Fig. 14c, the number of task assignments increases with |W| getting larger due to the fact that more spatial tasks can be conducted by more workers. Although SIP+SDAE+DIS-OGTA and SIP+SDAE+DIS-GGTA show a similar performance in ASR, the SIP+SDAE+DIS-OGTA performs better in number of task assignments, which demonstrates the effectiveness of optimal task assignment method.

6 Related Work

Spatial crowdsourcing (SC) is a new concept of online crowdsourcing, which employs smart device carriers as workers to physically travel to specified locations and perform the requested spatial tasks with various constraints [10, 27, 29, 33, 34]. Most existing research focuses on the task assignment [24,25,26]. Kazemi et al. [15] classify SC into two categories, namely server assigned tasks (SAT) and worker selected tasks (WST) based on the task publishing modes. In particular, for the SAT mode which is popular in existing researches, SC server is responsible for directly assigning proper tasks to nearby workers, which aims to maximize the number of assigned tasks after collecting all the locations of workers/tasks on the server side [8, 15, 16, 21] or maximize the reliability-and-diversity score of assignments [9], or maximize the number of accomplished tasks for a worker with an optimal schedule on the client side [12], or maximize the coverage of required skills of workers [7]. For the WST mode, spatial tasks are published online and then broadcast to all workers, such that workers can choose any task according to their personal preferences by themselves [11].

Meanwhile, quality assurance is an intractable problem needing to be solved during the process of spatial task assignment. Workers are more likely to honestly and promptly complete the assigned tasks if the quality control strategy is considered, e.g., giving higher priority to workers who are more interested in tasks. Although a few existing approaches consider workers’ preferences for tasks in crowdsourcing [1, 5], they just infer workers’ preferences from historical task-performing records or explicit feedbacks without taking workers’ social impact into consideration.

Most of the previous studies in spatial crowdsourcing mainly focus on assigning tasks to the individual worker, which unfortunately cannot be effectively applied for group task assignment. In recent years, a few researches [6, 13] are studied for group task assignment (also called collaborative task assignment), i.e., assigning tasks to a group of multiple workers. The groups are formed by workers in an ad hoc way, also called occasional groups, who have a shared purpose only in a certain time. Cheng et al. [6] propose a framework called cooperation-aware spatial crowdsourcing (CA-SC), in which they design both task-priority greedy approach and game-theoretic approach to solve the CA-SC problem, aiming to achieve high cooperation quality scores. Different from the algorithm proposed by Cheng et al. [6], our proposed algorithm aims to maximize the total number of task assignments and gives higher priorities to worker groups with higher consensus scores (including groups’ preference based on social impact and group members’ disagreement) on tasks.

7 Conclusion

In this paper, we propose a novel task assignment problem, called group task assignment (GTA), in spatial crowdsourcing. In order to achieve effective task assignment, we address a few challenges by proposing different strategies to obtain the social impact-based preferences of different worker groups for each task category and adopting an optimal algorithm to assign tasks. Moreover, we further optimize the original solution by proposing several strategies to improve the effectiveness of group task assignment, wherein a deep learning method is adopted to better learn workers’ social impact-based preferences and the group consensus is taken into consideration. Extensive empirical study demonstrates the effectiveness of our proposed solution.