Abstract
In this paper, we address the correlation problem in the anonymization of transactional data streams. We propose a bucketizationbased technique, entitled (k, l)clustering to prevent such privacy breaches by ensuring that the same k individuals remain grouped together over the entire anonymized stream. We evaluate our algorithm in terms of utility by considering two different (k, l)clustering approaches.
Similar content being viewed by others
Keywords
1 Introduction
We live in an era where the world is more connected than ever before, and everything is digitized from smartphones, smart vehicles to smart homes and smart cities, continually generating a tremendous amount of information. With this information at hand, many concerns may arise, one in particular, is the critical exposure of individuals’ privacy, putting their anonymity at risk [1, 2]. Several anonymization techniques are developed in the literature to preserve privacy. Whether they are generalizationbased techniques [3,4,5] that alter the original values or bucketizationbased techniques [6,7,8,9,10] that preserve privacy by splitting the dataset into sensitive and nonsensitive tables to hide the link between their values, they all assume that there is a tradeoff between good privacy and utility. It is a tradeoff that is highly required to keep the dataset suitable for analysis while preserving the individuals’ anonymity. However, it keeps anonymization vulnerable and unable to cope with all sort of attacks [11,12,13,14]. It is indeed difficult to provide a completely anonymous dataset without losing utility. There are many reasons for this to happen, notably, is the ability to presume knowledge of the adversary’s prior belief and her/his ability to gain insights after looking at the anonymized dataset. Besides, a dataset in which several tuples relate to the same individual may expose significant correlations between identifying and sensitive values. An adversary can use his/her knowledge of such correlations [11, 13], or use these correlations as foreground knowledge [15] to breach individuals’ privacy. To cope with this particular problem, safe grouping is proposed in [16, 17] to ensure that the individuals’ tuples are grouped in one and only one quasiidentifying group (QIgroup) that is at the same time ldiverse, respects a minimum diversity for identifying attribute values, and all individuals in the same QIgroup have an equal number of tuples. The (k, l)diversity [18] is another technique that uses generalization to associate k distinct individuals to ldiverse QIgroups. While these techniques are useful in dealing with the correlation problem on bulk datasets, they provide no proof of effectiveness in anonymizing data streams where data must be protected on the fly before being stored in an anonymized dataset. The anonymization technique has a partial view of the dataset, limited to the batch of tuples undergoing the anonymization.
Let us consider a car rental example scenario where each smart vehicle triggers an event between two piers in the form of a transaction to be stored in a dataset for analysis. Transactions are generated continuously as long as customers are driving their vehicles to form a data stream. In this scenario, we assume that the anonymization must be performed on the stream of tuples generated by the data source to output an anonymized dataset in the form shown in Fig. 1.
The released 2diverse dataset is divided into two separate tables to hide the link between the identifying and sensitive values as in [6, 7, 19]. In a QIgroup an identifying value cannot be associated with a sensitive value with a probability higher than 1/2. The problem arises when the identifying and sensitive values correlate across the QIgroups [16, 18, 20] (e.g., first two QIgroups in Fig. 1(b)). This leads to an implication that the values belong to the same individual.
In this paper, we extend the work in [16, 17] to address the correlation problem in the anonymization of transactional data streams where data dynamically changes and its distribution is imbalanced. We propose (k, l)clustering that continuously groups k distinct individuals into ldiverse QIgroups and ensures that these individuals remain grouped together in future releases of QIgroups. (k, l)clustering keeps track of incoming identifying values to safely release them across the QIgroups. It is a bucketization technique that prevents attribute disclosure, releasing trustful information. Our contributions in this paper include:

defining privacy properties that are required to bind the correlations in a data stream.

proposing a novel clustering approach to enforce the aforementioned privacy properties.
The remainder of this paper is organized as follows. In Sect. 2, we investigate works related to the anonymization of data streams. In Sect. 3, we define the basic concepts and definitions. We present our privacy model in Sect. 4 and describe the (k, l)clustering approaches. Section 5 evaluates the performance of our algorithm by adopting two clustering techniques to data streams.
2 Related Work
In [21], Cao et al. extend the definition of kanonymity to apply it on data streams and propose CASTLE, a clusteringbased algorithm, that publishes kanonymized clusters in an acceptable delay. An extension of CASTLE is presented in [22] to reduce the number of tuples in the clusters and to maximize the utility of the anonymized dataset. In another work [23], FAANST is proposed to anonymize numerical data streams. FADS is an anonymization algorithm proposed in [24, 25] that has convenient time and space scale with additional constraints on the size of the clusters size and their reuse strategy. While these techniques extend privacy solutions based on kanonmyity and ldiversity on transactional data streams, they do not take into account the correlation of the identifying and sensitive values across the QIgroups. Moreover, several studies [11, 13, 18, 20] have shown that correlations attacks can be launched not only on bucketization techniques but on generalizationbased techniques as well.
A similar work to ours is defined in [26] where the authors include background knowledge in their anonymization algorithm to deal with strong adversaries. They propose a hierarchical agglomerative algorithm to prevent attribute and identity disclosure. However, the authors only address correlations known to the adversary. Here, we consider that the correlations can be mined from the dataset and used as foreground knowledge to link individuals to their sensitive values. Alternatively, in [20], the authors present a sequential bottomup anonymization algorithm, KSAA, that uses generalization to protect against background knowledge attacks on different anonymized views of the same original dataset. KSAA clusters tuples and generates QIgroups satisfying the privacy model in the current view. It checks, in a second step, if the privacy constraint is satisfied when several views are joined together. Here, our clustering algorithm is applied on a stream of tuples on the fly where three requirements must be met including low retention of tuples, balanced memory usage and runtime. In [27], the authors propose a generalizationbased microaggregation algorithm for stream kanonymity that meets a maximum delay constraint, without preserving the order of incoming tuples in the published stream such as in [21]. Then, they improve the preservation of the original order of the tuples by using steered microaggregation while adding the timestamp as an artificial attribute. Similar to [21], we do not publish the time stamp attribute due to privacy constraints however we use it for experimental purposes.
On the other side, several notable works [29,30,31] have been done for differential privacy [32] for streaming data. In this work, we choose to work with bucketization technique that publishes trustworthy information. We particularly extend previous works [16, 17] to address correlations in the data stream in data sharing scenarios.
3 Preliminary Definitions
In this section, we present the basic concepts and definitions to be used in the remainder of this paper.
Definition 1
(Tuple  t). In a relational dataset, a tuple t is a finite ordered list of values \(\{v_{1},v_{2},...,v_{b}\}\) where, given a set of attributes \(\{A_{1},...,A_{b}\}, \forall i (1\,{\le }\,i\,{\le }\,b)\,v_{i}\,=\,t[A_{i}]\) refers to the value of attribute \(A_{i}\) in t. We categorize attributes as follows:

\({Identifier\,(A^{id})}\) is an attribute whose value is linked to an individual in a given dataset. For example, a social security number anonymized in a way to represent uniquely an individual but cannot explicitly identify her/him.

\({Sensitive\ attribute\,(A^{s})}\) reveals critical and sensitive information about a certain individual and must not be directly linked to individuals’ identifying values in data sharing, publishing or releasing scenarios.

\({Time\text {}stamp\,(A^{ts}})\) indicates the arrival time of the tuple, its position in the stream. The timestamp is considered identifying, which can be used to expose individuals’ privacy in a transactional data stream. Here, we do not publish the timestamp, we use it instead for evaluating the utility of our anonymization technique.
Definition 2
(Data Stream  S). A data stream \(S= t_{1}, t_{2}...,\) is a continuously growing dataset composed of infinite series of tuples received at each instance. Let U be the set of individuals of a specific population, \(\forall \) u \(\in \) U we denote by \(S_u\) the set of tuples in S related to the individual u, where \(\forall \) t \(\in \) Su, \(t[A^{id}]={v_{id}}\).
Definition 3
(Cluster  C). Let \( S^\prime \subset S \) be a set of tuples in S. A cluster C over \(S^\prime \) is defined as a set of tuples \(\{t_1, ..., t_n\}\) and a centroid \(V_{id}\) consisting of a set of identifying values such that, \(\forall t \in C, t[A^{id}] \in {V}_{id}\). We use the notation \({V}_{id}(C)\) to denote the centroid \({V}_{id}\) of C.
Definition 4
(Equivalence class/QIgroup) [1]. A quasiidentifier group (QIgroup) is defined as a subset \(QI_j, j=1,2,...\) of released tuples in \(S^{*} = \bigcup _{j=1}^{\infty }QI_j\) such that, for any \(j_1 \ne j_2\), \(QI_{j1} \cap QI_{j2} =\emptyset \).
We stick with the QIgroup terminology for compatibility with the broader anonymization literature, which can include identifying as well as quasiidentifying attributes (Table 1).
4 Privacy Preservation
We work under the assumption that the anonymization of the data stream will continuously release ldiverse QIgroups, and these QIgroups, if joined together, will not expose unsafe correlations between identifying and sensitive values. We define two types of adversaries, passive and active.

Passive adversary has no prior knowledge concerning the individuals and the correlations of their identifying and sensitive values in the dataset. She/He is able, however, to extract foreground knowledge from the anonymized dataset that can be used to breach privacy. For example knowing renting patterns of individuals, which might lead to link their identifying values to their identity and track them in the anonymized dataset.

Active adversary is equipped with certain knowledge about the individuals and the correlations of their identifying and sensitive values in the dataset before having access to its anonymized version. She/he can use that background knowledge to provoke a privacy breach. In our renting example, knowing the true identity, in plain text, of an individual (e.g. Full Name) alongside her/his location patterns might lead to link her/his identity to her/his identifying value in the stream thus exposing him in the anonymized dataset.
4.1 Privacy Model
Given a stream S and two userdefined constants \(l\ge 2\) and \(k\ge 2\), we say that an anonymization technique safely anonymizes S if it produces a stream \(S^*\) that satisfies the following properties:
Property 1
(Safe release of QIgroups). Provides safe correlation of identifying and sensitive values across the released QIgroups such that the intersection of any QIgroups in \(S^*\) on their identifying attribute \(A^{id}\) yields either k identifying values or none. Formally,
\(\forall v_{id} \in \mathcal {D}(A^{id})\), if \(v_{id} \in \pi _{A^{id}}QI_1\,\cap ...\,\cap \pi _{A^{id}}QI_j \), then there exists a set of
identifying values \(V_{id} \subseteq \mathcal {D}(A^{id})\), such that \(V_{id} =\{v_{id}, v_{id_1},...,v_{id_{k1}}\} \) and \(V_{id}=\pi _{A^{id}}QI_1\,\cap ...\,\cap \pi _{A^{id}}QI_j \). In other words,
In a less formal definition, the identifying values that are grouped together in a QIgroup must always remain grouped together throughout the entire anonymized stream.
Property 2
(ldiverse QIgroups). Ensures that all the anonymized and released QIgroups are ldiverse. Formally,
\(\forall v_{id} \in \mathcal {D}(A^{id}), \forall QI \in S^*, Pr(v_{id}, v_s QI) \le 1/l\).
Property 3
(Safe correlation of identifying values). Prohibits linking correlated identifying values in the same QIgroup to their corresponding sensitive values, which result in an inherent violation of ldiversity [16,17,18]. Formally,
\(\forall v_{id_1},v_{id_2}, f(v_{id_1}, QI_j) = f(v_{id_2}, QI_j) \) where \(f(v_{id_i}, QI_j)\) is a function that returns the number of occurrences of \(v_{id_i}\) in \(QI_j\).
Property 3 hides frequent correlations of identifying values in the same QIgroups. It handles cases arising when an adversary may be able to link an individual to his/her sensitive value or to narrow the possibilities for other individuals.
4.2 (k, l)Clustering for Privacy Preservation
To preserve our privacy properties, we propose a (k, l)clustering technique that groups tuples into clusters of disjoint centroids and releases, from these clusters, ldiverse QIgroups containing k distinct identifying values. In brief, our clustering technique works as follows:

It creates centroids containing k distinct identifying values: \(\forall QI_i,QI_j\) two QIgroups released from C, \(\pi _{A^{id}}QI_i = \pi _{A^{id}}QI_j = V_{id}(C)\) where \( V_{id}(C) = k\).

It ensures that an identifying value exists in one and only centroid: \(\forall C_1, C_2\) \(V_{id}(C_1) \cap V_{id}(C_2)=\emptyset \).

It releases a QIgroup from a cluster C such that: \(\forall QI\), a QIgroup created from a subset of tuples in the cluster C, and \(\forall t \in QI\), \(t[A^{id}] \in V_{id}(C)\).
(k, l)clustering is a bucketization technique that releases ldiverse QIgroups created from a subset of clusters having disjoint centroids. It ensures safe correlation of identifying and sensitive values across the QIgroups, i.e., once k identifying values are grouped in a QIgroup, they will remain grouped together in future releases of QIgroups throughout the anonymized stream. We assume that the clustering can be done in two ways, unsupervised and supervised as defined below.

Unsupervised (k, l)clustering has no prior knowledge about the distribution of identifying values in the original dataset. The clustering is done on firstcome, firstserve basis inspired by “bottomup” agglomerative clustering algorithms [26]. Unsupervised (k, l)clustering creates cluster centroids and groups tuples accordingly, in reference to their identifying values and privacy constants k and l.

Supervised (k, l)clustering has a partial or full view over the distribution of identifying values in the original dataset, thus and unlike the unsupervised clustering, clusters are created based on a predefined set of centroids \(\mathcal {V}=\{V^1_{id}, ..., V^m_{id}\}\) that are fed to the clustering technique prior the anonymization. Hence, the identifying and sensitive values that are highly correlated are grouped together in the same cluster to reduce the chances of having these values anonymized/suppressed to meet the privacy properties.
As shown in Fig. 2(c), ‘Allen_U1’ and ‘Cathy_U3’ are grouped together in 3 QIgroups because they occur the most in the incoming stream. However in Fig. 2(b), ‘Allen_U1’ is grouped alongside ‘Betty_U2’ and ‘Cathy_U3’ alongside ‘David_U4’ due to the order of their tuples in the data stream.
Lemma 1
Given a transactional stream S, safe clustering ensures the safe release of QIgroups in the published version \(S^{*}\).
Proof
Since (k, l)clustering is applied, \(\forall QI_i,QI_j\) two QIgroups released from C, \(\pi _{A^{id}}QI_i = \pi _{A^{id}}QI_j = V_{id}(C)\) where \( V_{id}(C) = k\). Alternatively, since (k, l)clustering ensures that an identifying value exists in one and only centroid, \(\forall C_1,C_2\), two distinct clusters over \(S^{*}, V_{id}(C_1) \cap V_{id}(C_2) =\emptyset \) can be written as \(\pi _{A^{id}}QI_1 \cap \pi _{A^{id}}QI_2= \emptyset \) where, \(QI_1,QI_2\) are two QIgroups released respectively from \(C_1\) and \(C_2\). Hence, the intersection of any QIgroups in S* on the identifying values yields either k identifying values or none.
4.3 (k, l)Clustering Algorithm
In this section, we present our (k, l)clustering algorithm applied on a transactional data stream. The main idea behind it is to process incoming tuples on the fly while guarantying safe release of ldiverse QIgroups. It requires two privacy constants k and l, the stream S, and a set of centroids \(\mathcal {V}\). (k, l)clustering outputs an anonymized data stream. The algorithm is composed of two main steps; safe clustering and tuple assignment.
4.4 Safe Clustering
The function assigns tuples to their corresponding clusters based on their identifying values.
4.5 Tuple Assignment
It assigns a tuple \(t_{p}\) to the selected cluster \(C_{sel}\) as follow: In a given cluster, all tuples are distributed over multiple subgroups. subgroups must contain at least k distinct identifying values before verifying their ldiversity.
After processing the entire stream, the algorithm will publish all subgroups which are not ldiverse nor reached size k (i.e., stored in the temp structure), by suppressing the identifying values. This guarantees the privacy constraints but impacts the utility of the dataset.
5 Experiments
In this section, we evaluate the efficiency of our unsupervised and supervised (k, l)clustering techniques by conducting a set of experiments detailed hereinafter. The algorithm is implemented in JAVA and tested on a PC with 2.20 GHz Intel Core i7 CPU, 8.0 GB RAM.
To simulate a data stream scenario, we used a rental transaction dataset^{Footnote 1} composed of 109763 tuples where each tuple is associated with a timestamp used only for evaluation purposes. We assume that at each time instant exactly one tuple arrives. As a result, timestamps range from 1 to S. The dataset contains 2374 distinct identifying values.
We designed two sets of experiments to examine the effectiveness of our approach in terms of utility:

Evaluating the percentage of suppressed identifying values.

Evaluating the delayretention of tuples in the queue before being released in QIgroups.
5.1 Percentage of Suppressed Identifying Values
As previously stated, after processing the stream over a specified interval of time, our algorithm suppresses the identifying values in the QIgroups that are not ldiverse nor of size k.
Using the unsupervised (k, l)clustering, we vary the value of k from 3 to 8, and examine the percentage of suppressed values. The parameter l is set to 3. For high values of k, the percentage of suppressed values increases. It reaches almost 60% for k = 8 as shown in Fig. 3. Here, we cluster identifying values based on their order of arrival. Each k individuals clustered together might not have the same distribution over the stream. Therefore, when k increases, it becomes more difficult to form QIgroups leading to an increase in the amount of suppressed values. Hence, we did not evaluate the unsupervised approach for k values higher than 8.
Using the supervised (k, l)clustering, we ensure that the most frequent identifying values are clustered then grouped together in the QIgroups. Consequently, we suppress fewer identifying values and thus, obtain better utility, as shown in Fig. 3, where the percentage of suppressed values reaches 1% for k = 20.
5.2 Retention of Tuples
A tuple is retained in the queue if it remains (a) in a subgroup that did not reach size k or (b) in the temporary subgroup of the corresponding cluster.
For each set of {k, l} values, we measure the retention delay of each tuple in memory. Then we compute the average delay time of all the tuples. This value is chosen as the delay constraint \(\delta \) defined in [28].
We consider a tuple that remains more than the specified delay \(\delta \) in memory a “delayed or outdated tuple”. \(\delta \) slightly varies with k. We applied our algorithm to the same rental dataset we used before, while adopting both approaches, as shown in Fig. 4. The delay constraint can be chosen depending on the data stream application requirement regarding availability of the anonymized tuples as stated in [28].
6 Conclusion
In this paper, we have defined new privacy properties to address the correlation problem in the anonymization of transactional data streams. A bucketization based technique, entitled (k, l)clustering, is proposed to enforce these privacy properties. (k, l)clustering processes incoming tuples on the fly. It continuously groups k distinct individuals into ldiverse QIgroups and ensures that these individuals remain grouped together in future releases of QIgroups. We evaluated our algorithm in terms of utility by considering two approaches: supervised and unsupervised. We showed, by conducting a set of experiments, that both approaches cope well with the streaming nature of the data while respecting the privacy constraints. The supervised approach yielded better results because it has a partial or full view over the distribution of identifying values in the dataset.
References
Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)
Sweeney, L.: kanonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.Based Syst. 10(5), 557–570 (2002)
Campan, A., Cooper, N., Truta, T.M.: Onthefly generalization hierarchies for numerical attributes revisited. In: Jonker, W., Petković, M. (eds.) SDM 2011. LNCS, vol. 6933, pp. 18–32. Springer, Heidelberg (2011). https://doi.org/10.1007/9783642235566_2
He, Y., Naughton, J.F.: Anonymization of setvalued data via topdown, local generalization. Proc. VLDB Endow. 2(1), 934–945 (2009)
Anjum, A., Raschia, G.: BangA: an efficient and flexible generalizationbased algorithm for privacy preserving data publication. Computers 6(1), 1 (2017)
Xiao, X., Tao. Y.: Anatomy: simple and effective privacy preservation. In: Proceedings of 32nd International Conference on Very Large Data Bases (VLDB 2006), Seoul, Korea (2006)
Li, T., Li, N., Zhang, J., Molloy, I.: Slicing: a new approach for privacy preserving data publishing. IEEE Trans. Knowl. Data Eng. 24(3), 561–574 (2012)
Ciriani, V., De Capitani Di Vimercati, S., Foresti, S., Jajodia, S., Paraboschi, S., Samarati, P.: Combining fragmentation and encryption to protect privacy in data storage. ACM Trans. Inf. Syst. Secur. 13, 22:1–22:33 (2010)
Manolis, T., Nikos, M., John, L., Spiros, S.: Privacy preservation by disassociation. Proc. VLDB Endow. 5(10), 944–955 (2012)
Wang, K., Wang, P., Fu, A.W., Wong, R.C.: Generalized bucketization scheme for flexible privacy settings. Inf. Sci. 348, 377–393 (2016)
Wong, R.C., Fu, A.W., Wang, K., Yu, P., Jian, P.: Can the utility of anonymized data be used for privacy breaches? ACM Trans. Knowl. Discov. Data 5(3), 16:1–16:24 (2011)
Cormode, G., Li, N., Li, T., Srivastava, D.: Minimizing minimality and maximizing utility: analyzing methodbased attacks on anonymized data. Proc. VLDB Endow. 3, 1045–1056 (2010)
Kifer, D., Attacks on privacy and deFinetti’s theorem. In: SIGMOD Conference, pp. 127–138 (2009)
Al Bouna, B., Clifton, C., Malluhi, Q.M.: Efficient sanitization of unsafe data correlations. In: Proceedings of the Workshops of the EDBT/ICDT 2015 Joint Conference (EDBT/ICDT), Brussels, Belgium, pp. 278–285 (2015)
Li, T., Li, N.: Injector: mining background knowledge for data anonymization. In: ICDE, pp. 446–455 (2008)
Al Bouna, B., Clifton, C., Malluhi, Q.: Using Safety constraint for transactional dataset anonymization. In: Wang, L., Shafiq, B. (eds.) DBSec 2013. LNCS, vol. 7964, pp. 164–178. Springer, Heidelberg (2013). https://doi.org/10.1007/9783642392566_11
Al Bouna, B., Clifton, C., Malluhi, Q.M.: Anonymizing transactional datasets. J. Comput. Secur. 23(1), 89–106 (2015)
Gong, Q., Luo, J., Yang, M., Ni, W., Li, X.I.: Anonymizing 1: M microdata with high utility. Knowl.Based Syst. 115(Suppl. C), 15–26 (2017)
Lu, J., Wang, P., Zhao, L., Yang, J.: Sanatomy: privacy preserving publishing of data streams via anatomy. In: 2010 Third International Symposium on Information Processing (ISIP). IEEE (2010)
Yazdani, N., Amiri, F., Shakery, A.: Bottomup sequential anonymization in the presence of adversary knowledge. Inf. Sci. 405, 316–335 (2018)
Cao, J., Carminati, B., Ferrari, E., Tan, K.: Castle: continuously anonymizing data streams. IEEE Trans. Dependable Secur. Comput. 8(3), 337–352 (2011)
Zhao, L., Wang, P., Lu, J., Yang, J.: Bcastle: an efficient publishing algorithm for kanonymizing data streams. In: 2010 Second WRI Global Congress on Intelligent Systems (GCIS), pp. 2155–6083. IEEE (2011)
Zakerzadeh, H., Osborn, S.L.: FAANST: fast anonymizing algorithm for numerical streaming DaTa. In: GarciaAlfaro, J., NavarroArribas, G., Cavalli, A., Leneutre, J. (eds.) DPM/SETOP 2010. LNCS, vol. 6514, pp. 36–50. Springer, Heidelberg (2011). https://doi.org/10.1007/9783642193484_4
Guo, K., Zhang, Q.: Fast clusteringbased anonymization approaches with time constraints for data streams. Knowl.Based Syst. 46, 95–108 (2013)
Noferesti, M., Mohammadian, E., Jalili, R.: Fast: Fast anonymization of big data streams. In: Proceeding BigDataScience, 14 Proceedings of the 2014 International Conference on Big Data Science and Computing. ACM (2014)
Shakery, A., Amiri, F., Yazdani, N., Chinaei, A.H.: Hierarchical anonymization algorithms against background knowledge attack in data releasing. Knowl.Based Syst. 101, 71–89 (2016)
DomingoFerrer, J., SoriaComas, J.: Steered microaggregation: a unified primitive for anonymization of data sets and data streams. In: 2017 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE (2017)
Ghafoor, A., Pervaiz, Z., Aref, W.G.: Precisionbounded access control using slidingwindow query views for privacypreserving data streams. IEEE Trans. Knowl. Data Eng. 27, 1992–2004 (2015)
Bonomi, L., Xiong, L.: On differentially private longest increasing subsequence computation in data stream. Trans. Data Priv. 9, 73–100 (2016)
Nie, Y., et al.: Geospatial streams publish with differential privacy. In: Wang, S., Zhou, A. (eds.) CollaborateCom 2016. LNICST, vol. 201, pp. 152–164. Springer, Cham (2017). https://doi.org/10.1007/9783319592886_14
Liu, X., et al.: On efficient and robust anonymization for privacy protection on massive streaming categorical information. IEEE Trans. Dependable Secur. Comput. 14, 507–520 (2017)
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Tekli, J., Al Bouna, B., Bou Issa, Y., Kamradt, M., Haraty, R. (2018). (k, l)Clustering for Transactional Data Streams Anonymization. In: Su, C., Kikuchi, H. (eds) Information Security Practice and Experience. ISPEC 2018. Lecture Notes in Computer Science(), vol 11125. Springer, Cham. https://doi.org/10.1007/9783319998077_35
Download citation
DOI: https://doi.org/10.1007/9783319998077_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 9783319998060
Online ISBN: 9783319998077
eBook Packages: Computer ScienceComputer Science (R0)