Appendix 1: Extensions to multi-class and real-valued labels
We introduce the detailed update rules of modified LC methods to deal with multi-class and real-valued labels, and then we explain how to extend the inference algorithms to preserve worker privacy.
Appendix 1.1: Multi-class labels
The LC method was originally proposed for multi-class labels by Dawid and Skene (1979). Let us assume a task to give a \(K\)-class label (\(K\ge 2\)). For each \(i\in {\mathcal I}\) and \(j\in {\mathcal {J}}\), a crowd label \(y_{i,j}\in \{0,\dots ,K-1\}(=:{\mathcal K})\) is generated by the multinomial distribution
$$\begin{aligned} \pi _{jkl} = \Pr [y_{i,j} = k \mid y_{i}=l, \Pi _{j}], \end{aligned}$$
where \(\sum _{k\in {\mathcal K}} \pi _{jkl} = 1\) holds for all \(l\in {\mathcal K}\), and we denote \(\Pi _{j} = \{\pi _{jkl} \mid k,l\in {\mathcal K}\}\). Also, for each \(i\in {\mathcal I}\), the true label \(y_i\in {\mathcal K}\) is generated by
$$\begin{aligned} p_l = \Pr [y_{i} = l], \end{aligned}$$
where \(\sum _{l\in {\mathcal K}} p_l = 1\) holds. The model parameters \(\Pi =\bigcup _{j\in {\mathcal {J}}}\Pi _{j}\) and \(\{p_l \mid l\in {\mathcal K}\}\) and the posterior probabilities of the true labels \(\mu _{il} = \Pr [y_i = l \mid {\mathcal Y}, \Pi ]\) are estimated using the following EM algorithm.
-
E-step:
for each \(i\in {\mathcal I}\), update \(\{\mu _{il} \mid l\in {\mathcal K}\}\) as
$$\begin{aligned} \mu _{il}&= \dfrac{p_l \rho _{il}}{\sum _{l^{\prime }\in {\mathcal K}} p_{l^{\prime }}\rho _{il^{\prime }}},\\ \mathrm{where\ } \log \rho _{il}&= \sum _{j\in {\mathcal {J}}_{i}} \sum _{k\in {\mathcal K}} {\mathbf I}(y_{i,j}=k) \log \pi _{jkl}. \end{aligned}$$
-
M-step:
for each \(j\in {\mathcal {J}}\), update \(\Pi _j\) as
$$\begin{aligned} \pi _{jkl} = \dfrac{\sum _{i\in {\mathcal I}_j} \mu _{il} {\mathbf I}(y_{i,j} = k)}{\sum _{i\in {\mathcal I}_j} \mu _{il}}, \end{aligned}$$
and for each \(l\in {\mathcal K}\), update \(p_l\) as
$$\begin{aligned} p_l = \dfrac{1}{|{\mathcal I}|}\sum _{i\in {\mathcal I}}\mu _{il}. \end{aligned}$$
This algorithm can be extended to preserve worker privacy. In the E-step, the parties calculate \(\{\log \rho _{il} \mid i\in {\mathcal I}, l\in {\mathcal K}\}\) using our secure sum protocol, and the requester calculates and broadcasts \(\{\mu _{il}\mid i\in {\mathcal I}, l\in {\mathcal K}\}\). In the M-step, each worker \(j\) calculates \(\{\pi _{jkl} \mid k,l\in {\mathcal K}\}\), and the requester calculates \(\{p_l \mid {l\in {\mathcal K}}\}\).
Appendix 1.2: Real-valued labels
The LC method was modified to deal with real-valued labels by Raykar et al. (2010). Let us assume a task to give a real-valued label. For each \(i\in {\mathcal I}\) and \(j\in {\mathcal {J}}\), a crowd label \(y_{i,j}\in \mathbb {R}\) is generated by the normal distribution
$$\begin{aligned} p(y_{i,j}\mid y_{i}, \tau _j, \gamma ) = \mathcal {N}(y_{i,j} \mid y_{i}, 1/\tau _j + 1/\gamma ), \end{aligned}$$
where \(\tau _j (> 0)\) is the precision parameter of the normal distribution, which is interpreted as the ability of worker \(j\), and \(\gamma \) works as regularization. Let us denote \(1/\lambda _j := 1/\tau _j + 1/\gamma \). Assuming that the crowd labels were generated by this model, the true labels and the precision parameters are estimated by the following EM-like algorithm.
-
E-step: for each \(i\in {\mathcal I}\), update the true label \(y_i\) as
$$\begin{aligned} y_i = \dfrac{\sum _{j\in {\mathcal {J}}_i} \lambda _j y_{i,j}}{\sum _{j\in {\mathcal {J}}_i} \lambda _j}. \end{aligned}$$
-
M-step: for each \(j\in {\mathcal {J}}\), update \(\lambda _j\) by solving
$$\begin{aligned} \dfrac{1}{\lambda _j} = \dfrac{1}{|{\mathcal I}_j|}\sum _{i\in {\mathcal I}_j} (y_{i,j} - y_i)^2. \end{aligned}$$
This algorithm can also be extended to preserve worker privacy. In the E-step, the parties calculate \(\left\{ \sum _{j\in {\mathcal {J}}_i} \lambda _j y_{i,j}, \sum _{j\in {\mathcal {J}}_i} \lambda _j \mid {i\in {\mathcal I}}\right\} \) using our secure sum protocol, and the requester calculates and broadcasts \(\{y_i \mid {i\in {\mathcal I}}\}\). In the M-step, each worker \(j\) calculates \(\lambda _j\).