In [27], Pfitzmann and Hansen provide definitions for anonymity, unlinkability, unobservability and pseudonymity. Even though outside the context of a formal framework, the definitions in this seminal work have served as a reference point by researchers for the understanding of privacy notions. In this section, we formally express the said (yet not only these) notions by carefully specifying a corresponding leakage function.
Basic leakage sets. Below, we define some useful sets that will enable the succinct description of the various leakage functions that we will introduce. In our formalization, leakage will derive from the history entries that are in a ‘pending’ mode. This is due to technical reasons, as the ideal-world simulator \(\mathsf {Sim}\) (cf. Sect. 3.3) must be aware of the actions to be taken by the email privacy functionality \(\mathcal {F}^{\mathsf {Leak},\varDelta _\mathsf {net}}_\mathsf {priv}(\mathbf {P})\) before allowing their execution, so that it can simulate the real-world run in an indistinguishable manner. In the following, the symbol \(*\) denotes a wildcard, and \(\mathsf {ptr}'\le \mathsf {ptr}\) denotes that entry indexed with pointer \(\mathsf {ptr}'\) was added earlier than the entry with pointer \(\mathsf {ptr}\).
– The active address set for H by pointer \(\mathsf {ptr}\):
$$\begin{aligned} \begin{aligned}&\mathsf {Act}_\mathsf {ptr}[H]\!=:\!\Big \{C_\ell @\mathsf {SP}_i\;\Big |\;\exists \! \mathsf {ptr}'\le \mathsf {ptr}:\Big [\big [\big (\mathsf {ptr}',\big (\mathsf {sid},*,\textsc {Active},C_\ell @\mathsf {SP}_i\big ),`\textsf {pending}'\big )\in H\big ]\vee \\&\;\vee \big [\big (\mathsf {ptr}',\big (\mathsf {sid},*,\textsc {Register},C_\ell @\mathsf {SP}_i\big ),`\textsf {pending}'\big )\in H\big ]\Big ] \wedge \\&\;\wedge \Big [\forall \mathsf {ptr}'':\mathsf {ptr}'\le \mathsf {ptr}''\le \mathsf {ptr}\Rightarrow \big (\mathsf {ptr}'',\big (\mathsf {sid},*,\textsc {Inactive},C_\ell @\mathsf {SP}_i\big ),`\textsf {pending}'\big )\notin H\Big ]\Big \}. \end{aligned} \end{aligned}$$
Note. To simplify the notation and terminology that follows, we consider as active all the addresses that are in a pending registration status.
-
The sender set for H by pointer \(\mathsf {ptr}\):
$$ \mathbf {S}_\mathsf {ptr}[H]:=\Big \{C_s@\mathsf {SP}_i\;\Big |\;\exists \mathsf {ptr}'\le \mathsf {ptr}:\big (\mathsf {ptr}',\big (\mathsf {sid},*,\textsc {Send},\langle C_s@\mathsf {SP}_i,*,*\rangle \big ),`\textsf {pending}'\big )\in H\Big \}. $$
-
The sender multiset for H by pointer \(\mathsf {ptr}\), denoted by \(\llbracket \mathbf {S}_\mathsf {ptr}\rrbracket [H]\), is defined analogously. The difference with \(\mathbf {S}_\mathsf {ptr}[H]\) is that the cardinality of the pending \(\textsc {Send}\) messages provided by \(C_s@\mathsf {SP}_i\) is attached.
-
The message-sender set for H by pointer \(\mathsf {ptr}\):
$$\begin{aligned} \begin{aligned}&\mathbf {MS}_\mathsf {ptr}[H]:=\Big \{(M,C_s@\mathsf {SP}_i)\;\Big |\;\exists \mathsf {ptr}'\le \mathsf {ptr}:\\&\quad \quad \big (\mathsf {ptr}',\big (\mathsf {sid},*,\textsc {Send},\langle C_s@\mathsf {SP}_i,M,*\rangle \big ),`\textsf {pending}'\big )\in H\Big \}. \end{aligned} \end{aligned}$$
-
The recipient set for H by pointer \(\mathsf {ptr}\):
$$\begin{aligned} \begin{aligned}&\mathbf {R}_\mathsf {ptr}[H]:=\Big \{C_r@\mathsf {SP}_j\;\Big |\;\exists \mathsf {ptr}'\le \mathsf {ptr}:\\&\quad \big (\mathsf {ptr}',\big (\mathsf {sid},*,\textsc {Send},\langle *,*,C_r@\mathsf {SP}_j\rangle \big ),`\textsf {pending}'\big )\in H\Big \}. \end{aligned} \end{aligned}$$
-
The recipient multiset for H at time slot T, denoted by \(\llbracket \mathbf {R}_\mathsf {ptr}\rrbracket [H]\), is defined analogously. The difference with \(\mathbf {R}_\mathsf {ptr}[H]\) is that the cardinality of the pending \(\textsc {Send}\) messages intended for \(C_r@\mathsf {SP}_j\) is attached.
-
The message-recipient set for H by pointer \(\mathsf {ptr}\):
$$\begin{aligned} \begin{aligned}&\mathbf {MR}_\mathsf {ptr}[H]:=\Big \{(M,C_r@\mathsf {SP}_j)\;\Big |\;\exists \mathsf {ptr}'\le \mathsf {ptr}:\\&\quad \quad \big (\mathsf {ptr}',\big (\mathsf {sid},*,\textsc {Send},\langle *,M,C_r@\mathsf {SP}_j\rangle \big ),`\textsf {pending}'\big )\in H\Big \}. \end{aligned} \end{aligned}$$
-
The set of fetching clients for H by pointer \(\mathsf {ptr}\)
$$ \mathbf {F}_\mathsf {ptr}[H]:=\Big \{C_r@\mathsf {SP}_j\;\Big |\;\exists \! \mathsf {ptr}'\le \!\mathsf {ptr}:\big (\mathsf {ptr},\big (\mathsf {sid},*,\textsc {Fetch},C_r@\mathsf {SP}_j\big ),`\textsf {pending}'\big )\Big \}. $$
Unobservability. Unobservability is the state where “the messages are not discernible from random noise”. Here, we focus on the case of relationship unobservability, that we will refer to unobservability for brevity, where within the set of all possible sender-recipient-pairs, a message is exchanged in any relationship. Hence, in our setting the unobservability set is the set of the users that are online, i.e. only the “activity bit”. As a result, we can define the unobservability leakage function \(\mathsf {Leak}_\mathsf {unob}\) as the active address set:
$$\begin{aligned} \mathsf {Leak}_\mathsf {unob}(\mathsf {ptr},H):=\mathsf {Act}_\mathsf {ptr}[H]. \end{aligned}$$
(1)
Remark 2
(Unobservability a golden standard from email privacy). In our UC formalization of e-mail ecosystems, we consider a dynamic scenario where the clients register, go online/offline and make custom fetch requests, which is consistent with the real-world dynamics of email communication. It is easy to see that in such a setting the clients’ online/offline status may be leaked to a global observer. E.g., the environment may provide send requests to offline clients and notify the global adversary that provided the said requests, so that the latter can check the activity of those clients. Hence, in our framework, unobservability as defined in Eq. (1), sets a “golden standard" for optimal privacy. In Sect. 5, we show that this golden standard is feasible in principle. Namely, we describe a theoretical construction with quadratic communication complexity and we prove it achieves unobservability. As a result, that construction sets one extreme point in the privacy vs. efficiency trade off for the client-server email infrastructure, the other being a simple and fast network with no security enhancements. Clearly, the challenge of every email construction is to balance the said trade off between these two extreme points.
We conclude our remark noting that a higher level privacy (e.g., no leakage at all) could be possible if we considered an alternative setting where the email addresses are a priori given, the clients are always online and mail delivery is via continuous push by the SPs. However, we believe that such a setting is restrictive for formally capturing what is an email ecosystem in general.
Anonymity. According to [27], anonymity “is the state of being not identifiable within a set of subjects, the anonymity set”. In the email scenario, a sender (resp. recipient) should be anonymous within the set of potential senders (resp. recipients), i.e. the sender (resp. recipient) anonymity set. In addition, anonymity sets may change over time, which in our framework is done via global clock advancement and per slot. We recall from the discussion in Remark 2 that in our setting, the anonymity sets are restricted within the set of online users.
We define the predicate \(\mathsf {End}(\cdot ,\cdot )\) over the pointers and history transcripts to denote that a pointer \(\mathsf {ptr}\) refers to the last history entry before the functionality enters the Clock advancement phase in order to finalize execution for the running time slot. By the above, we define the anonymity leakage function, \(\mathsf {Leak}_\mathsf {anon}\), as follows:
$$\begin{aligned} \mathsf {Leak}_\mathsf {anon}(\mathsf {ptr},H):=\left\{ \begin{array}{ll} \big (\mathbf {S}_\mathsf {ptr}[H],\mathbf {R}_\mathsf {ptr}[H],\mathsf {Act}_\mathsf {ptr}[H]\big ),&{}\text { if }\mathsf {End}(\mathsf {ptr},H)\\ \mathsf {Act}_\mathsf {ptr}[H],&{} \text { otherwise} \end{array} \right. \end{aligned}$$
(2)
Unlinkability. Unlinkability of items of interest (e.g. subjects, messages, etc.) means that “the ability of the attacker to relate these items does not increase by observing the system”. Here, we provide an example of unlinkability from the sender side, where the message and its intended recipient can not be related to the original sender. We define the sender-side unlinkability leakage function \(\mathsf {Leak}_\mathsf {s.unlink}\) as follows:
$$\begin{aligned} \mathsf {Leak}_\mathsf {s.unlink}(\mathsf {ptr},H):=\left\{ \begin{array}{ll} \big (\mathbf {S}_\mathsf {ptr}[H],\mathbf {MR}_\mathsf {ptr}[H],\mathsf {Act}_\mathsf {ptr}[H]\big ),&{}\text { if }\mathsf {End}(\mathsf {ptr},H)\\ \mathsf {Act}_\mathsf {ptr}[H],&{} \text { otherwise} \end{array} \right. \end{aligned}$$
(3)
Alternatively, we may define unlinkability from the recipient side via the function
$$\begin{aligned} \mathsf {Leak}_\mathsf {r.unlink}(\mathsf {ptr},H):=\left\{ \begin{array}{ll} \big (\mathbf {MS}_\mathsf {ptr}[H],\mathbf {R}_\mathsf {ptr}[H],\mathsf {Act}_\mathsf {ptr}[H]\big ),&{}\text { if }\mathsf {End}(\mathsf {ptr},H)\\ \mathsf {Act}_\mathsf {ptr}[H],&{} \text { otherwise} \end{array} \right. \end{aligned}$$
Pseudonymity. According to [27] “being pseudonymous is the state of using a pseudonym as ID”. To capture pseudonymity, we may slightly abuse definition and consider leakage as a randomized function (or program). Namely, the functionality initially chooses a random permutation \(\pi \) over the set of clients \(\mathbf {C}\), and the pseudonym of each client \(C_\ell \) is \(\pi (C_\ell )\in [n]\). We denote by \(\pi [H]\) the “pseudonymized history" w.r.t. to \(\pi \), i.e. in every entry of H we replace \(C_\ell \) by \(\pi (C_\ell )\). Clearly, in our infrastructure, the clients remain pseudonymous among the set of clients that are registered to the same SP. We define the pseudonymity leakage function as follows:
$$\begin{aligned} \mathsf {Leak}_\mathsf {pseudon}(\mathsf {ptr},H):=\pi [H],\; \text { where }\pi \overset{\$}{\leftarrow }\big \{f\;\big |\;f:\mathbf {C}\longrightarrow [n]\big \}. \end{aligned}$$
(4)
Besides anonymity, unlinkability, unobservability and pseudonymity defined in [27], other meaningful notions of privacy can be formally expressed in our framework. We present two such notions below.
Weak anonymity. We define weak anonymity, as the privacy notion where the number of messages that a client sends or receives and her fetching activity is leaked. In this weaker notion, the anonymity set for a sender (resp. recipient) consists of the subset of senders (resp. recipients) that are associated with the same number of pending messages. In addition, now the leakage for sender anonymity set is gradually released according to the protocol scheduling, whereas the recipient anonymity set still is leaked “per slot”. The weak anonymity leakage function, \(\mathsf {Leak}_\mathsf {w. anon}\), is defined via the sender and recipient multisets as follows:
$$\begin{aligned} \mathsf {Leak}_\mathsf {w.anon}(\mathsf {ptr},H):=\left\{ \begin{array}{ll} \big (\llbracket \mathbf {S}_\mathsf {ptr}\rrbracket [H],\llbracket \mathbf {R}_\mathsf {ptr}\rrbracket [H],\mathbf {F}_\mathsf {ptr}[H],\mathsf {Act}_\mathsf {ptr}[H]\big ),&{}\text { if }\mathsf {End}(\mathsf {ptr},H)\\ \big (\llbracket \mathbf {S}_\mathsf {ptr}\rrbracket [H],\mathbf {F}_\mathsf {ptr}[H],\mathsf {Act}_\mathsf {ptr}[H]\big ),&{} \text { otherwise} \end{array} \right. \end{aligned}$$
(5)
Remark 3
Even though not a very strong privacy notion, weak anonymity supports a reasonable level of privacy for email realizations that aim at a manageable overhead and practical use. Indeed, observe that if we can not tolerate to blow up the ecosystem’s complexity by requiring some form of cover traffic (which is a plausible requirement in practical scenarios), then a global adversary monitoring the client-SP channel can easily infer the number of sent/received messages over this channel. Moreover, one may informally argue that in case the email users do not vary significantly in terms of their sending and fetching activity (or at least they can be grouped into large enough sets of similar activity), weak anonymity and standard anonymity are not far. In Sect. 6, we present an efficient weakly anonymous email construction based on parallel mixing [18, 19].
End-to-end encryption. The standard notion of end-to-end encryption, now applied in many internet applications (e.g., Signal, WhatsApp, Viber, Facebook Messenger, Skype), suggests context hiding of M in the communication of the end users (up to the message length |M|), in our case the sender and the recipient. Hence, we define the end-to-end leakage function \(\mathsf {Leak}_\mathsf {e2e}\) as shown below.
$$\begin{aligned} \begin{aligned} \mathsf {Leak}_\mathsf {e2e}&:=\Big (\mathsf {Act}_\mathsf {ptr}[H],\Big \{(C_s@\mathsf {SP}_i,|M|,C_r@\mathsf {SP}_j)\;\Big |\;\exists \mathsf {ptr}'\le \mathsf {ptr}:\\&\quad \quad \big (\mathsf {ptr}',\big (\mathsf {sid},*,\textsc {Send},\langle C_s@\mathsf {SP}_i,M,C_r@\mathsf {SP}_j\rangle \big ),`\textsf {pending}'\big )\in H\Big \}\Big ). \end{aligned} \end{aligned}$$
(6)
Relation between privacy notions. Observe that the relation between two privacy notions can be deduced via their corresponding leakage functions. Namely, if for every \((\mathsf {ptr},H)\) a PPT adversary given the output of leakage function \(\mathsf {Leak}_1(\mathsf {ptr},H)\) can derive the output of some other leakage functions \(\mathsf {Leak}_2(\mathsf {ptr},H)\), then \(\mathsf {Leak}_2(\cdot ,\cdot )\) refers to a stronger notion of privacy than \(\mathsf {Leak}_1(\cdot ,\cdot )\). In Fig. 3, given the definitions of \(\mathsf {Leak}_\mathsf {unob},\mathsf {Leak}_\mathsf {anon},\mathsf {Leak}_\mathsf {s.unlink}/\mathsf {Leak}_\mathsf {r.unlink},\mathsf {Leak}_\mathsf {w.anon},\mathsf {Leak}_\mathsf {e2e}\) above we relate the respective notions in an intuitively consistent way.
Remark 4
We observe that pseudonymity can not be compared to any of the notions in Fig. 3. Indeed, even for the stronger notion of unobservability, having the set of active addresses is not enough information to derive the pseudonyms. Conversely, having the entire email activity pseudonymized, is not enough information to derive the active clients’ real identities. In addition, we can combine pseudonymity with some other privacy notion and result in a new ‘pseudonymized’ version of the latter (e.g. pseudonymous unobservability/anonymity/etc.). It is easy to see that the new notions can also be expressed via suitable (randomized) leakage functions, by applying a random permutation on the clients’ identities and then define leakage as in the original corresponding leakage function, up to this permutation. E.g., for \(\pi \overset{\$}{\leftarrow }\big \{f\;\big |\;f:\mathbf {C}\longrightarrow \mathbf {C}\big \}\), “pseudonymized unobservability” could be expressed via the leakage function
$$\begin{aligned} \mathsf {Leak}_\mathsf {ps.unob}(\mathsf {ptr},H):=\Big \{\pi (C_\ell )@\mathsf {SP}_i\;\big |\;C_\ell @\mathsf {SP}_i\in \mathsf {Act}_\mathsf {ptr}[H]\Big \}. \end{aligned}$$
Remark 5
As our E2E leakage does not cover fetch information, strictly speaking the implication from Weak anonymity to E2E encryption only holds if the fetch behavior is either known in advance (e.g. because of the system specification) or irrelevant. One could also opt to add the additional leakage to the E2E definition, but we believe there is little practical value in doing so.