DataDriven Policy Impact Evaluation pp 6783  Cite as
Privacy in Microdata Release: Challenges, Techniques, and Approaches
 1 Citations
 6k Downloads
Abstract
Releasing and disseminating useful microdata while ensuring that no personal or sensitive information is improperly exposed is a complex problem, heavily investigated by the scientific community in the past couple of decades. Various microdata protection approaches have then been proposed, achieving different privacy requirements through appropriate protection techniques. This chapter discusses the privacy risks that can arise in microdata release and illustrates some wellknown privacypreserving techniques and approaches.
Keywords
Microdata Release Microdata Protection Differential Privacy (DP) Protection Techniques Protection Approach1 Introduction
We live in a society that relies more and more on the availability of data to make knowledgebased decisions (Livraga, 2015). The benefits that can be driven by data sharing and dissemination have been widely recognized for a long time now (Foresti, 2011; Livraga, 2015), and are visible to everybody: for instance, medical research is a simple example of a field that, leveraging analysis of real clinical trials made available by hospitals, can improve the life quality of individuals. At the same time, many laws and regulations have recognized that privacy is a primary right of citizens, acknowledging the principle that sensitive information (e.g., personal information that refers to an individual) must be protected from improper disclosure. To resolve the tension between the (equally strong) needs for data privacy and availability, the scientific community has been devoting major efforts for decades to investigating models and approaches that can allow a data owner to release a data collection guaranteeing that sensitive information be properly protected, while still allowing useful analysis to be performed (Bezzi et al., 2012; De Capitani di Vimercati et al., 2011b).
In the past, data were typically released in the form of aggregate statistics (macrodata): while providing a first layer of protection to the individuals to whom the statistics pertain, as no specific data of single respondents (i.e., the individuals to whom data items refer) are (apparently) disclosed (De Capitani di Vimercati et al., 2011a), releasing precomputed statistics inevitably limits the analysis that a recipient can do. To provide recipients with greater flexibility in performing analysis, many situations require the release of detailed data, called microdata. Indeed, since analyses are not precomputed, more freedom is left to the final recipients. The downside, however, comes in terms of major privacy concerns, as microdata can include sensitive information precisely related to individuals.
As will be illustrated in this chapter, the first attempts towards the development of microdata protection approaches pursued what today are typically called syntactic privacy guarantees (Ciriani et al., 2007a; Clifton and Tassa, 2013; De Capitani di Vimercati et al., 2012). Traditional protection approaches (e.g., kanonymity (Samarati, 2001) and its variations) operate by removing and/or generalizing (i.e., making less precise/more general) all information that can identify a respondent, so that each respondent is hidden in a group of individuals sharing the same identifying information. In this way, it is not possible to precisely link an individual to her (sensitive) information. Existing solutions following this approach can be used to protect respondents’ identities as well as their sensitive information (Livraga, 2015), also in emerging scenarios (De Capitani di Vimercati et al., 2015b). Alternative approaches based on the notion of differential privacy (Dwork, 2006) have then been proposed. Trying to pursue a relaxed and microdataadapted version of a wellknown definition of privacy by Dalenius (1977), that anything that can be learned about a respondent from a statistical database should be learnable without access to the database, differential privacy aims at ensuring that the inclusion in a dataset of the information of an individual does not significantly alter the outcome of analysis of the dataset. To achieve its privacy goal, differential privacy typically relies on controlled noise addition, thus perturbing the data to be released (in contrast to kanonymitylike solutions that, operating through generalization, guarantee data truthfulness). There has been a major debate in the scientific community regarding which approach (syntactic techniques versus differential privacy) is the “correct” one (Clifton and Tassa, 2013; Kifer and Machanavajjhala, 2011), and recent studies have pointed out that, while they pursue different privacy goals through different protection techniques, both approaches are successfully applicable to different scenarios, and there is room for both of them (Clifton and Tassa, 2013; Li et al., 2012a), possibly jointly adopted (SoriaComas et al., 2014). Both the approaches have in fact been used in different application scenarios, ranging from the protection of location data (e.g., Peng et al. 2016; Xiao and Xiong 2015), to privacypreserving data mining (e.g., Ciriani et al. 2008; Li et al. 2012c), and to the private analysis of social network data (e.g., Tai et al. 2014; Wang et al. 2016), just to name a few.
The goal of this chapter is to illustrate some of the bestknown protection techniques and approaches that can be used to ensure microdata privacy. The remainder of this chapter is organized as follows. Section 2 presents the basic concepts behind the problem of microdata protection, illustrating possible privacy risks and available protection techniques. Section 3 discusses some wellknown protection approaches. Section 4 illustrates some extensions of the traditional approaches, proposed to relax or remove some assumptions for use in advanced scenarios, with a specific focus on the problem of protecting microdata coming from multiple sources. Finally, Sect. 5 concludes the chapter.
2 Microdata Protection: Basic Concepts
This section illustrates the key concepts behind the problem of protecting microdata privacy. It discusses firstly some privacy issues that can arise in microdata release (Sect. 2.1), and secondly the protection techniques that have been proposed by the research community to protect microdata (Sect. 2.2).
2.1 Microdata Privacy

Identifiers: attributes that uniquely identify a respondent (e.g., Name and SSN).

Quasiidentifiers (QI): attributes that, in combination, can be linked to external information to reidentify (all or some of) the respondents to whom information refers, or to reduce the uncertainty over their identities (e.g., DoB, Sex, and ZIP).

Sensitive attributes: attributes that represent information that should be kept confidential (e.g., Disease).

Identity disclosure, occurring whenever the identity of a respondent can be somehow determined and associated with a (deidentified) tuple in the released microdata table.

Attribute disclosure, occurring when a (sensitive) attribute value can be associated with an individual (without necessarily being able to link the value to a specific tuple).
2.2 Protection Techniques

Nonperturbative techniques do not directly modify the original data, but remove details from the microdata table: they sacrifice data completeness by releasing possibly imprecise and/or incomplete data to preserve data truthfulness. Examples of nonperturbative techniques include suppression, generalization, and bucketization. Suppression selectively removes information from the microdata table. Generalization, possibly based on ad hoc generalization hierarchies, selectively replaces the content of some cells in the microdata table (e.g., a complete date of birth) with more general values (e.g., year of birth). Bucketization operates on sets of attributes whose joint visibility should be prevented (e.g., the name and the disease of a patient), and operates by first partitioning tuples in buckets and attributes in groups, and then shuffling the semituples within buckets so as to break their correspondence (De Capitani di Vimercati et al., 2015a, 2010; Li et al., 2012b; Xiao and Tao, 2006).

Perturbative techniques distort the microdata table to be released by modifying its informative content, hence sacrificing data truthfulness. Examples of perturbative techniques include noise addition and microaggregation. Noise addition intuitively adds controlled noise to the original data collection. Protection is provided by the fact that some values (or combinations among them) included in the released table might not correspond to real ones, and vice versa. Microaggregation (originally proposed for continuous numerical data and then extended also to categorical data (Torra, 2004)) selectively replaces original tuples with new ones. It operates by first clustering the tuples in the original microdata table in groups of a certain cardinality in such a way that tuples in the same cluster are similar to each other, and then by replacing the tuples in a cluster with a representative one computed through an aggregation operator (e.g., mean or median).
The protection techniques illustrated above can be adopted to effectively protect the confidentiality of a microdata collection to be released. Given a data collection to be protected and released, some key questions then need to be answered: what technique should be used? Should a combination of techniques be preferred to a single one? To which portion of the data (e.g., the entire table, a subset of tuples, and a subset of attributes) should the technique be applied? Whatever the answers to these questions, an important observation is that all microdata protection techniques cause an inevitable information loss: nonperturbative techniques produce datasets that are not as complete or as precise as the originals, and perturbative techniques produce datasets that are distorted. For these reasons, the scientific community has recently developed protection approaches that, given a privacy requirement to be satisfied (e.g., the protection of the identities of the microdata respondents), rely on a controlled adoption of some of these microdata protection techniques to protect privacy while limiting information loss, as illustrated in the remainder of this chapter.
3 Microdata Protection Approaches
This section illustrates the most important protection approaches that have driven research in microdata protection in the past couple of decades, together with the privacy requirements they pursue and the microdata protection techniques (see Sect. 2) that are typically adopted for their enforcement.
3.1 kAnonymity
The first and pioneering approach for protecting microdata against identity disclosure is represented by kanonymity (Samarati, 2001), enforcing a protection requirement typically applied by statistical agencies that demands that any released information be indistinguishably related to no less than a certain number k of respondents. Following the assumption that reidentification of deidentified microdata takes advantage of QI attributes, such general requirement is translated into the kanonymity requirement: each release of data must be such that every combination of values of the QI can be indistinctly matched to at least k respondents (Samarati, 2001). A microdata table satisfies the kanonymity requirement iff each tuple cannot be related to less than k individuals in the population, and vice versa (i.e., each individual in the population cannot be related to less than k tuples in the table). These two conditions hold since the original definition of kanonymity assumes that each respondent is represented by at most one tuple in the released table and vice versa (i.e., each tuple includes information related to one respondent only).
Verifying the satisfaction of the kanonymity requirement would require knowledge of all existing external sources of information that an adversary might use for the linking attack. This assumption is indeed unrealistic in practice, and therefore kanonymity takes the safe approach of requiring that each respondent be indistinguishable from at least k − 1 other respondents in the released microdata. A table is therefore said to be kanonymous if each combination of values of the QI appears in it with either zero or at least k occurrences. For instance, the table in Fig. 1a is 1anonymous if we assume the QI to be composed of DoB, Sex, and ZIP, since at least one combination of their values (i.e., 〈1958∕12∕11, F, 10180〉) appears only once in the table (i.e., in the eleventh tuple). Since each combination of QI values is shared by at least k different tuples in the microdata table, each respondent cannot be associated with fewer than k tuples in the released table and vice versa, also satisfying the original kanonymity requirement (being the definition of a kanonymous table a sufficient, though not necessary, condition for the satisfaction of the kanonymity requirement).
Traditional approaches to enforcing kanonymity operate on QI attributes by modifying their values in the microdata to be released, while leaving sensitive and nonsensitive attributes as they are (recall that direct identifiers are removed from the microdata as the first step). Among the possible data protection techniques that might be enforced on the QI, kanonymity typically relies on the combined adoption of generalization and suppression, which have the advantage of preserving data truthfulness when compared to perturbative techniques (e.g., noise addition; see Sect. 2.2). Suppression is used to couple generalization, as it can help in reducing the amount of generalization that has to be enforced to achieve kanonymity; in this way, it is possible to produce more precise (though incomplete) tables. The intuitive rationale is that, if a microdata table includes a limited number of outliers (i.e., QI values with less than k occurrences) that would force a large amount of generalization to satisfy kanonymity, these outliers could be more conveniently removed from the table, improving the quality of the released data.
3.2 ℓDiversity and tCloseness

Homogeneity attack. A homogeneity attack occurs when all the tuples in an equivalence class (i.e., the set of tuples with the same value for the QI) in a kanonymous table assume the same value for the sensitive attribute. If a data recipient knows the QI value of a target individual x, she can identify the equivalence class representing x, and then discover the value of x’s sensitive attribute. For instance, consider the 4anonymous table in Fig. 2 and suppose that a recipient knows that Gloria is a female living in the 10039 area and born on 1955/09/10. Since all the tuples in the equivalence class with QI value equal to 〈1955∕09, ∗, 100 ∗∗〉 assume value helicobacter for attribute Disease, the recipient can infer that Gloria suffers from a helicobacter infection.

External knowledge attack. The external knowledge attack occurs when the data recipient possesses some additional knowledge (not included in the kanonymous table) about a target respondent x, and can use it to reduce the uncertainty about the value of x’s sensitive attribute. For instance, consider the 4anonymous table in Fig. 2 and suppose that a recipient knows that a neighbor, Mina, is a female living in the 10045 area and born on 1955/12/30. Observing the 4anonymous table, the recipient can infer only that the neighbor suffers from dermatitis, retinitis, or gastritis. Suppose now that the recipient sees Mina tanning without screens at the park every day: due to this external information, the recipient can exclude the likelihood that Mina suffers from dermatitis or retinitis, and infer that she suffers from gastritis.
Computing an ℓdiverse table minimizing the loss of information caused by generalization and suppression is computationally hard. However, since ℓdiversity basically requires computing a kanonymous table (with additional constraints on the sensitive values), any algorithm proposed for computing a kanonymous table that minimizes loss of information can be adapted to also guarantee ℓdiversity, simply by controlling whether or not the condition on the diversity of the sensitive attribute values is satisfied by all the equivalence classes (Machanavajjhala et al., 2007). As a last remark on ℓdiversity, it might be possible to obtain ℓdiverse tables by departing from generalization and adopting instead a bucketizationbased approach (see Sect. 2.2), for instance, by adopting the Anatomy approach (Xiao and Tao, 2006), or other (possibly more general) techniques (Ciriani et al., 2012; De Capitani di Vimercati et al., 2014, 2015a, 2010).
Although ℓdiversity represents a first step in counteracting attribute disclosure, an ℓdiverse table might still be vulnerable to information leakage caused by skewness attacks (where significant differences can be seen in the frequency distribution of the sensitive values within an equivalence class with respect to that of the same values in the overall population), and similarity attacks (where the ℓ sensitive values of the tuples in an equivalence class are semantically similar, although syntactically different) (Li et al., 2007). To counteract these two disclosure risks, it is possible to rely on the definition of tcloseness (Li et al., 2007), requiring that the frequency distribution of the sensitive values in each equivalence class be close (i.e., with distance smaller than a fixed threshold t) to that in the released microdata table.
3.3 Differential Privacy
Differential privacy (DP) is a recent privacy definition that departs from the guarantees and enforcement techniques characterizing kanonymity and its extensions, and aims to guarantee that the release of a dataset does not disclose sensitive information about any individual, who may or may not be represented therein (Dwork, 2006). DP aims at releasing a dataset permitting the disclosure of properties about the population as a whole (rather than the microdata themselves), while protecting the privacy of single individuals. The privacy guarantee provided by DP relies on ensuring that the probability of a recipient correctly inferring the sensitive value of a target respondent x be not affected by the presence or absence of x’s tuple in the released dataset.
DP can be adopted either to respond to queries (interactive scenario) issued against a microdata table or to produce a sanitized dataset to be released (noninteractive scenario). In the interactive scenario, DP is ensured by adding random noise to the query results evaluated on the original dataset (Dwork et al., 2006), sacrificing data truthfulness. Unfortunately, the interactive scenario limits the analysis that the recipient can perform, as it allows only a limited number of queries to be answered (SoriaComas et al., 2014). In the noninteractive scenario, a dataset is produced and released, typically based on the evaluation of histogram queries (i.e., counting the number of records having a given value). To reduce information leakage, these counts are computed through a DP mechanism.
Unlike kanonymity and its variations, which guarantee a certain degree of privacy to the microdata to be released, DP aims to guarantee that the release mechanism \(\mathcal {K}\) (e.g., the algorithm adopted to compute the data to be released, whether query answers in the interactive scenario or sanitized counts in the noninteractive scenario) is safe with respect to privacy breaches. A dataset to be released satisfies DP if the removal/insertion of one tuple from/to the dataset does not significantly affect the result of the evaluation of \(\mathcal {K}\). In this way, the protection offered by DP lies in the fact that the impact that a respondent has on the outcome of a certain analysis (or on the generation of the sanitized dataset) remains negligible. In fact, DP guarantees that the probability of observing a result for the evaluation of \(\mathcal {K}\) over T is close to the probability of observing that result for the evaluation of \(\mathcal {K}\) over a dataset T′ differing from T for a tuple only.
DP offers strong privacy guarantees at the price of imposing strict conditions on what kind of, and how, data can be released (Clifton and Tassa, 2013). In addition, the amount of noise that needs to be adopted can significantly distort the released data (Clifton and Tassa, 2013; Fredrikson et al., 2014; SoriaComas et al., 2014), thus limiting in practice their utility for final recipients. Some relaxations of DP have therefore been proposed (e.g., Dwork and Smith 2009; Mironov et al. 2009), possibly applicable to specific realworld scenarios (e.g., Hong et al. 2015), with the aim of finding a reasonable tradeoff between privacy protection and data utility.
It is interesting to note that a recent approach has been proposed using kanonymity and DP approaches together, with the aim of reducing the amount of noise needed to ensure DP (SoriaComas et al., 2014). The proposal builds on the observation that, given a microdata table T and a query q for which the outputs are required to be differentially private, if the query is run on a microaggregationbased (see Sect. 3.1) kanonymous version T_{k} of T, the amount of noise to be added to the output of q for achieving DP is greatly reduced (compared with the noise that would be needed if q were run on the original T). To this end, microaggregation should be performed carefully so that it can be considered insensitive to the input data (i.e., for any pair of datasets T and T′ differing by one tuple, given the clusters {c_{1}, …, c_{n}} produced by the microaggregation over T and the clusters \(\{c^{\prime }_1, \ldots , c^{\prime }_n\}\) produced by the microaggregation over T′, each pair of corresponding clusters differs in at most one tuple). This is a key property required for the microaggregation to succeed in reducing the noise that will then be employed to ensure DP, as it reduces the sensitivity of the query to be executed (SoriaComas et al., 2014) and hence the result distortion. This approach can also be used in the noninteractive scenario. To this end, a kanonymous version T_{k} of T is first built through an insensitive microaggregation. The differentially private dataset T_{DP} is then built by collating the n differentially private answers to a set of n queries (with n the number of tuples in T_{k}), where the ith query (i = 1, …, n) aims at retrieving the ith tuple in T_{k}.
4 Extensions for Advanced Scenarios
The traditional microdata protection approaches in the literature (see Sect. 3) are built on specific assumptions that can limit their applicability to certain scenarios. For instance, they assume the data to be released in a single table, completely available for anonymization before release, and never republished. However, it may happen that data are either republished over time or continuously generated, as in the case with data streams: recent proposals (e.g., Fung et al. 2008; Loukides et al. 2013; Shmueli and Tassa 2015; Shmueli et al. 2012; Tai et al. 2014; Xiao and Tao 2007) have extended traditional approaches to deal with these scenarios.
One of the assumptions on which the original formulations of kanonymity, DP, and their extensions were based is that the microdata to be anonymized are stored in a single table. This assumption represents a limitation in many realworld scenarios, in which the information that needs to be released can be spread across various datasets, and where the privacy goal is that all released information be effectively protected. There are two naive approaches that one might think of adopting: joinandanonymize and anonymizeandjoin. The first approach, in which all tables to be released are first joined in a universal relation that is then anonymized by adopting one of the traditional approaches, might not work whenever there is no single subject authorized to see and join all original relations, which might be owned by different authorities. The second approach (i.e., first anonymize each table singularly taken and then release the join among the sanitized versions of all tables) does not guarantee appropriate protection: for instance, if a QI is spread across multiple tables, it could not be effectively anonymized by looking at each relation individually. The scientific community has recently started looking at this problem, and some solutions have been proposed (typically extending kanonymity and its variations) to address the multiple tables scenario.
A first distinction has to be made depending on whether the multiple tables to be released belong to the same authority (e.g., different relations of a single database) that therefore has a complete view over them, or the tables belong to different authorities, where no subject in the picture has a global view of the entire informative content that needs to be released. In the first scenario, a careful joinandanonymize approach might do. However, the anonymization has to be performed with extreme care to avoid vulnerability to privacy breaches. For instance, assume n relations, owned by the same authority, to be released together provided that kanonymity is satisfied by their join. When computing the join among the n relations, it might be possible that the kanonymity assumption of one respondent being represented by a single tuple is not satisfied (as different tuples could be related to the same respondent). The risk here is that (some of) the different tuples related to the same individual are “anonymized together”: hence, an equivalence class of size k might refer to less than k respondents, violating their privacy despite the relation being apparently kanonymous. To overcome this issue, MultiR kanonymity (Nergiz et al., 2007) has been proposed to extend the definition of kanonymity and ℓdiversity to multiple relations belonging to a snowflake database schema.
When the relations to be anonymized belong to different authorities, it is clearly not possible to join them beforehand. One might think to first anonymize each relation individually and then join the obtained results on the (anonymized) QI. Unfortunately, this strategy is not trivial: besides possibly exploding in size, the joined tuples could not be used for meaningful analysis, as many tuples in the join would be incorrect (joining over the anonymized QI would join more tuples than using the original values). Some approaches have recently been proposed to address this issue. For instance, distributed kanonymity (DkA (Jiang and Clifton, 2006)) proposes a distributed framework for achieving kanonymity. The applicability of this approach is limited to two relations (defined as two views over a global data collection), which can be correctly joined through a 1:1 join on a common key. The framework builds a kanonymous join of the two datasets, without disclosing any information from one site to the other. In a nutshell, the approach works iteratively in three steps: (1) each data holder produces a kanonymous version of her own dataset; (2) each data holder checks whether or not joining the obtained kanonymous datasets would maintain global kanonymity; and (3) if so, join and release, otherwise go back to step 1 and further generalize the original data. Checking the global anonymity (step 2) is a critical task, as it requires the two parties to exchange their anonymized tables. To avoid information leakage, encryption is adopted and, in this regard, the price to be paid for this approach is in terms of the required encryption and decryption overhead (Jiang and Clifton, 2006; Mohammed et al., 2011). Recent efforts that have recently been devoted to enforce DP in a multirelational setting (Mohammed et al., 2014) (also focusing on two relations only) should also be highlighted. The solution in Mohammed et al. (2011) instead does not pose assumptions on the number of relations to be joined but requires active cooperation among the parties holding the relations to achieve kanonymity. In addition, the approach in Mohammed et al. (2011) can be successfully extended to provide privacy beyond kanonymity (e.g., by ensuring ℓdiversity). Finally, it should be noted that specific approaches have also been proposed to protect different tables that need to be sequentially released (Wang and Fung, 2006).
5 Conclusions
This chapter has addressed the problem of protecting privacy in microdata release. After a discussion of the privacy risks that can arise when microdata need to be shared or disseminated, some of the bestknown microdata protection techniques and approaches developed by the scientific community have been illustrated. Some recent extensions of traditional approaches, proposed to fit advanced scenarios, have also been highlighted.
Footnotes
 1.
In this chapter, SSN, DoB, and ZIP are attributes representing Social Security Numbers (the de facto US identification number for taxation and other purposes), dates of birth, and ZIP codes (US postal codes).
Notes
Acknowledgements
This paper is based on joint work with Sabrina De Capitani di Vimercati, Sara Foresti, and Pierangela Samarati, whom the author would like to thank. This work was supported in part by the European Commission through the Seventh Framework Programme under grant agreement 312797 (ABC4EU) and through the Horizon 2020 programme under grant agreement 644579 (ESCUDOCLOUD).
References
 Bayardo RJ, Agrawal R (2005) Data privacy through optimal kanonymization. In: Proceedings of ICDE 2005, Tokyo, April 2005Google Scholar
 Bezzi M, De Capitani di Vimercati S, Foresti S, Livraga G, Samarati P, Sassi R (2012) Modeling and preventing inferences from sensitive value distributions in data release. J Comput Secur 20(4):393–436CrossRefGoogle Scholar
 Ciriani V, De Capitani di Vimercati S, Foresti S, Samarati P (2007) kanonymity. In: Yu T, Jajodia S (eds) Secure data management in decentralized systems. Springer, BerlinGoogle Scholar
 Ciriani V, De Capitani di Vimercati S, Foresti S, Samarati P (2007) Microdata protection. In: Yu T, Jajodia S (eds) Secure data management in decentralized systems. Springer, BerlinGoogle Scholar
 Ciriani V, De Capitani di Vimercati S, Foresti S, Samarati P (2008) kAnonymous data mining: a survey. In: Aggarwal C, Yu P (eds) Privacypreserving data mining: models and algorithms. Springer, BerlinGoogle Scholar
 Ciriani V, De Capitani di Vimercati S, Foresti S, Livraga G, Samarati P (2012) An OBDD approach to enforce confidentiality and visibility constraints in data publishing. J Comput Secur 20(5):463–508CrossRefGoogle Scholar
 Clifton C, Tassa T (2013) On syntactic anonymity and differential privacy. Trans Data Priv 6(2):161–183Google Scholar
 Dalenius T (1977) Towards a methodology for statistical disclosure control. Statistik Tidskrift 15:429–444Google Scholar
 De Capitani di Vimercati S, Foresti S, Jajodia S, Paraboschi S, Samarati P (2010) Fragments and loose associations: respecting privacy in data publishing. Proc VLDB Endow 3(1):1370–1381CrossRefGoogle Scholar
 De Capitani di Vimercati S, Foresti S, Livraga G, Samarati P (2011) Anonymization of statistical data. Inform Technol 53(1):18–25Google Scholar
 De Capitani di Vimercati S, Foresti S, Livraga G, Samarati P (2011) Protecting privacy in data release. In: Aldini A, Gorrieri R (eds) Foundations of security analysis and design VI. Springer, BerlinGoogle Scholar
 De Capitani di Vimercati S, Foresti S, Livraga G, Samarati P (2012) Data privacy: definitions and techniques. Int J Uncertainty Fuzziness Knowl Based Syst 20(6):793–817CrossRefGoogle Scholar
 De Capitani di Vimercati S, Foresti S, Jajodia S, Livraga G, Paraboschi S, Samarati P (2014) Fragmentation in presence of data dependencies. IEEE Trans Dependable Secure Comput 11(6):510–523CrossRefGoogle Scholar
 De Capitani di Vimercati S, Foresti S, Jajodia S, Livraga G, Paraboschi S, Samarati P (2015) Loose associations to increase utility in data publishing. J Comput Secur 23(1):59–88CrossRefGoogle Scholar
 De Capitani di Vimercati S, Foresti S, Livraga G, Paraboschi S, Samarati P (2015) Privacy in pervasive systems: social and legal aspects and technical solutions. In: Colace F, Santo MD, Moscato V, Picariello A, Schreiber F, Tanca L (eds) Data management in pervasive systems. Springer, BerlinGoogle Scholar
 DomingoFerrer J, Torra V (2005) Ordinal, continuous and heterogeneous kanonymity through microaggregation. Data Min Knowl Disc 11(2):195–212CrossRefGoogle Scholar
 Dwork C (2006) Differential privacy. In: Proceedings of ICALP 2006, Venice, July 2006Google Scholar
 Dwork C, Smith A (2009) Differential privacy for statistics: what we know and what we want to learn. J Priv Confid 1(2):135–154Google Scholar
 Dwork C, Mcsherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Proceedings of TCC 2006, New York, NY, March 2006Google Scholar
 Federal Committee on Statistical Methodology (2005) Statistical policy working paper 22 (Second Version). Report on statistical disclosure limitation methodology, December 2005Google Scholar
 Foresti S (2011) Preserving privacy in data outsourcing. Springer, BerlinCrossRefGoogle Scholar
 Fredrikson M, Lantz E, Jha S, Lin S, Page D, Ristenpart T (2014) Privacy in pharmacogenetics: an endtoend case study of personalized warfarin dosing. In: Proceedings of the 23rd USENIX security symposium, San Diego, August 2014Google Scholar
 Fung BCM, Wang K, Fu AWC, Pei J (2008) Anonymity for continuous data publishing. In: Proceedings of EDBT 2008, Nantes, March 2008Google Scholar
 Golle P (2006) Revisiting the uniqueness of simple demographics in the US population. In: Proceedings of WPES 2006, Alexandria, October 2006Google Scholar
 Hong Y, Vaidya J, Lu H, Karras P, Goel S (2015) Collaborative search log sanitization: toward differential privacy and boosted utility. IEEE Trans Dependable Secure Comput 12(5):504–518CrossRefGoogle Scholar
 Jiang W, Clifton C (2006) A secure distributed framework for achieving kanonymity. VLDB J 15(4):316–333CrossRefGoogle Scholar
 Kifer D, Machanavajjhala A (2011) No free lunch in data privacy. In: Proceedings of SIGMOD 2011, Athens, June 2011Google Scholar
 LeFevre K, DeWitt D, Ramakrishnan R (2005) Incognito: efficient fulldomain kanonymity. In: Proceedings of SIGMOD 2005, Baltimore, June 2005Google Scholar
 LeFevre K, DeWitt D, Ramakrishnan R (2006) Mondrian multidimensional kanonymity. In: Proceedings of ICDE 2006, Atlanta, April 2006Google Scholar
 Li N, Li T, Venkatasubramanian S (2007) tcloseness: privacy beyond kanonymity and ℓdiversity. In: Proceedings of ICDE 2007, IstanbulGoogle Scholar
 Li N, Qardaji W, Su D (2012) On sampling, anonymization, and differential privacy or, kanonymization meets differential privacy. In: Proceedings of ASIACCS 2012, Seoul, May 2012Google Scholar
 Li T, Li N, Zhang J, Molloy I (2012) Slicing: a new approach for privacy preserving data publishing. IEEE Trans Knowl Data Eng 24(3):561–574CrossRefGoogle Scholar
 Li Y, Chen M, Li Q, Zhang W (2012) Enabling multilevel trust in privacy preserving data mining. IEEE Trans Knowl Data Eng 24(9):1598–1612CrossRefGoogle Scholar
 Livraga G (2015) Protecting privacy in data release. Springer, BerlinCrossRefGoogle Scholar
 Loukides G, GkoulalasDivanis A, Shao J (2013) Efficient and flexible anonymization of transaction data. Knowl Inform Syst. 36(1):153–210CrossRefGoogle Scholar
 Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) ℓDiversity: privacy beyond kanonymity. ACM Trans Knowl Discov from Data 1(1):3:1–3:52Google Scholar
 Mironov I, Pandey O, Reingold O, Vadhan S (2009) Computational differential privacy. In: Proceedings of CRYPTO 2009, Santa Barbara, August 2009Google Scholar
 Mohammed N, Fung BC, Debbabi M (2011) Anonymity meets game theory: secure data integration with malicious participants. VLDB J 20(4):567–588CrossRefGoogle Scholar
 Mohammed N, Alhadidi D, Fung BC, Debbabi M (2014) Secure twoparty differentially private data release for vertically partitioned data. IEEE Trans Dependable Secure Comput 11(1): 59–71CrossRefGoogle Scholar
 Nergiz M, Clifton C, Nergiz A (2007) Multirelational kanonymity. In: Proceedings of ICDE 2007, IstanbulGoogle Scholar
 Peng T, Liu Q, Meng D, Wang G (2017) Collaborative trajectory privacy preserving scheme in locationbased services. Inform Sci 387:165–179. Available onlineCrossRefGoogle Scholar
 Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027CrossRefGoogle Scholar
 Shmueli E, Tassa T (2015) Privacy by diversity in sequential releases of databases. Inform Sci 298:344–372CrossRefGoogle Scholar
 Shmueli E, Tassa T, Wasserstein R, Shapira B, Rokach L (2012) Limiting disclosure of sensitive data in sequential releases of databases. Inform Sci 191:98–127CrossRefGoogle Scholar
 SoriaComas J, DomingoFerrer J, Sánchez D, Martínez S (2014) Enhancing data utility in differential privacy via microaggregationbased kanonymity. VLDB J 23(5):771–794CrossRefGoogle Scholar
 Tai CH, Tseng PJ, Yu PS, Chen MS (2014) Identity protection in sequential releases of dynamic networks. IEEE Trans Knowl Data Eng 26(3):635–651CrossRefGoogle Scholar
 Torra V (2004) Microaggregation for categorical variables: a median based approach. In: Proceedings of PSD 2004, Barcelona, June 2004CrossRefGoogle Scholar
 Wang K, Fung B (2006) Anonymizing sequential releases. In: Proceedings of KDD 2006, Philadelphia, August 2006Google Scholar
 Wang Q, Zhang Y, Lu X, Wang Z, Qin Z, Ren K (2016) Realtime and spatiotemporal crowdsourced social network data publishing with differential privacy. IEEE Trans Dependable Secure Comput (in press)Google Scholar
 Xiao X, Tao Y (2006) Anatomy: simple and effective privacy preservation. In: Proceedings of VLDB 2006, Seoul, September 2006Google Scholar
 Xiao X, Tao Y (2007) mInvariance: towards privacy preserving republication of dynamic datasets. In: Proceedings of SIGMOD 2007, Beijing, June 2007Google Scholar
 Xiao Y, Xiong L (2015) Protecting locations with differential privacy under temporal correlations. In: Proceedings of CCS 2015, Denver, October 2015Google Scholar
Copyright information
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.