Differentially private data aggregating with relative error constraint

Privacy preserving methods supporting for data aggregating have attracted the attention of researchers in multidisciplinary fields. Among the advanced methods, differential privacy (DP) has become an influential privacy mechanism owing to its rigorous privacy guarantee and high data utility. But DP has no limitation on the bound of noise, leading to a low-level utility. Recently, researchers investigate how to preserving rigorous privacy guarantee while limiting the relative error to a fixed bound. However, these schemes destroy the statistical properties, including the mean, variance and MSE, which are the foundational elements for data aggregating and analyzing. In this paper, we explore the optimal privacy preserving solution, including novel definitions and implementing mechanisms, to maintain the statistical properties while satisfying DP with a fixed relative error bound. Experimental evaluation demonstrates that our mechanism outperforms current schemes in terms of security and utility for large quantities of queries.


Introduction
In data-driven applications, such as location based services (LBSs), disease surveillance and social networks, etc., information fusion is necessary for data owners to obtain better services [1].For example, in location based applications, behaviors aggregating one's precise position can be used to get better shopping recommendations and route planning services.In disease surveillance, gathering individual's physical data can prevent the outbreak of some diseases.
As suggested in above examples, information fusion has outstanding benefits for knowledge discovery and acquisition.But the aggregated data may contain individual's sensitive information (e.g., personal home address, health condition).Untreated data may disclose individual's privacy while data owners may be reluctant to release their true data values due to privacy concerns.Therefore, privacy preserving data fusion has become a substantial issue in data aggregating and mining [2][3][4].
Early privacy preserving schemes inherently rely on the security guarantee of designed algorithms [5][6][7].Their privacy strengths are decided by their designed algorithms, but not the mathematical foundation.The security and effectiveness of the algorithm is hard to be analyzed mathematically, nor can it be proved theoretically.To remedy this problem, DP proposed by Dwork [8,9] has a solid mathematical foundation, and no restrictions on adversary's background knowledge.It is a type of privacy protection method which strictly defines the strength of protection and data utility.Due to the fact that DP can provide a complete theoretical guarantee of privacy security and better data availability, it has become a significant privacy preserving framework in recent years.
The idea of DP is to preserve individual's privacy by introducing perturbation.Specifically, the noise scale is decided by privacy strength.A big privacy degree means a good protection, but injects high-level noise into the data to be protected, leading to low-level data utility (e.g., performance of clustering or predicting accuracy).In real-world applications, data curators always hope that the error of the data uploaded by users is sufficiently small.The aggregated data with large errors may have a great influence on the accuracy of mining results [10].For example, a user who wants to protect his/her precise location, uploads perturbed positions to the service provider (SP) to obtain LBSs, but the positions that the SP collected may have large errors and destroy the data utility, as illustrated in Fig. 1 and Example 1.
Example 1 Consider a scenario that Amy is at the bank, and she wants to query the restaurants within one kilometer radius around her true position x.To preserve her location privacy, Amy utilizes DP technology to perturb her true position x and obtain a perturbation x .Then Amy sends x to the SP and increases the query radius (e.g., 2 KM) to filter the results that her wants.In this case, after aggregating perturbed positions, SP may want to use these data to do some analysis and mining tasks (e.g., clustering or classification).However, some positions that Amy uploaded may have large errors, which greatly affect the analysis and mining results.In some extreme cases, data uploaded by users may be useless for a mining task.
The above example illustrates that a good protection needs high-level noise, but high-level noise may violate data availability.The perturbed data that containing much more noise have great effect on analysis and mining results.If we can limit the noise size to an acceptable range while satisfying the requirement of DP, then this dilemma can be avoided.The goal of this paper is to preserve DP while limiting the noisy error to a fixed bound.
Current approaches attempt to solve this problem from two natural solutions: one category is privacy-first mechanisms.Their idea is to reduce the error as much as possible while meeting the privacy requirement of DP.The other category is called "accuracy-first" methods.They first limit the error to a fixed bound and then design the noise form to meet the requirement of DP.
Although various prioritization solutions towards mitigating differentially private release problem with constrained error, current schemes are still afflicted with the following challenges: • Violating DP: Although accuracy-first methods limit the error to a fixed bound.But the noise generated by these methods does not always conform to Laplacian distribution.While only Laplacian noise could meet the privacy requirement of DP strictly, thus accuracy-first methods violate DP. • Changing Statistical Properties: Privacy-first schemes try to improve the accuracy of publishing results under the condition of satisfying DP.But they have an effect on the statistical properties, especially the variance and Mean Square Error (MSE), which are the fundamental elements for data analysis and mining.
These challenges imply that a novel error constrained mechanism for differentially private data release is in high demand.With respect to the first challenge, to strictly satisfy the requirement of DP, we generate the needed noise in a truncated form.The truncated distribution can satisfy DP, while limiting the error to a fixed bound.For the second challenge, we observed that a linear transformation of a certain distribution still conforms to the same distribution.To maintain the statistical properties, a linear transformation can be conducted to generate noise satisfying the privacy and utility requirements.
Based on these considerations, we propose a relative error constrained differentially private data release solution.We first give the notions of RE-DP and RE-Geoindistinguishability, as the definitions of error constrained DP in one and two dimensions respectively.Then we utilize truncated mechanisms and linear transformations to realize these two notions in practice.To the best of our knowledge, our solution is the first DP technique that renders linear transformation to maintain the statistical properties (mean, variance and MSE) under the condition of constrained noise error.Our contributions are threefold: • Error Constrained DP: Standard DP has no limitation on error constraint.We extend DP to the notions of RE-DP and RE-Geo-indistinguishability, which are the expansion of DP in one and two dimensional form with error constraint respectively.RE-DP and RE-Geoindistinguishability limits the noise error to a fixed bound in one and two dimensions respectively, while preserving -DP.

• RE-DP:
We propose a mechanism to realize RE-DP in one dimension, including a truncated Laplace mechanism to generate the noise that satisfying DP meanwhile having relative error constraint, and a linear transformation to modify the deviation of mean, variance and MSE.By this means, our scheme does not change the statistical properties under the condition of error constraint while preserving DP. • RE-Geo-indistinguishability: Except the one dimension case, we extend our mechanism to two dimension, which is usually used for location data protect.We propose a truncated planar Laplace mechanism and a linear transformation method.Truncated planar Laplace mechanism can limit the error to a fixed bound and the linear transformation keeps the statistical properties unchanged.
The rest of this paper is arranged as follows.In Section "Related work", we introduce the mechanisms associated with our work.Then notations and preliminaries adopted in this work are described in Section "Notations and preliminaries".Section "RE-DP" demonstrates the notion of RE-DP and corresponding implementation mechanism in one dimension.The expansion from one dimension to two dimen- sion, including the notion of RE-geo-indistinguishability and its realizing scheme are described in Section "REgeo-indistinguishability".The experimental evaluation was performed in Section "Experimental evaluation", while conclusions and future work are in Section "Conclusions and future works".

Related work
Existing utility error constrained differentially private data aggregating mechanisms can be classified into two categories.The former is accuracy-first methods.They first set a fixed noise bound and design the noise, whose size is within the bound, to satisfy the requirement of DP.The other type is privacy-first methods.These methods first make the noise satisfy the requirement of DP, then they improve the data utility using a variety of means.

Privacy-first
When publishing various sensitive data (such as search logs [11,12], machine learning results [13][14][15], etc.), there are a lot of technologies for enforcing DP.Nonetheless, none of these technologies can optimize errors of published data.Instead, they can not reduce noisy results' variance or specific application indicators, such as the accuracy of classification [16].Next, we will discuss some typical methods of privacy priorities.
Barak et al. [17] designed a technology to publish the marginals of a given data set.However, their purpose is not only to improve the accuracy of published data, but also to make noisy results more user-friendly.Their method can guarantee each marginal number is non-negative and all margins are consistent.That is to say, the sum of each marginal number and that of the others should be equal.To extend the scenario of Barak et al.'s work [17], Kasiviswanathan et al. [18] proved several lower bounds of the relative error of noisy marginal counting.However, they did not come up with any specific algorithm for the release margin.Instead, Blum et al. [19] presented a specific mechanism to publish single dimensional data, which has a good performance on relative error in terms of various counting query even facing the worse case.Moreover, literature [20] proposed an method to improve the performance range of Blum et al.'s scheme.To address multidimensional data, Xiao et al. [21] designed a method for multidimensional data publishing.
Li et al. [22] summarized the methods in [20,21].They proposes an optimal scheme, which can minimize the relative error of any given query.In fact, the method proposed by Li et al. can also address marginal data.Since we can consider the marginal count data as the result to a specific count query.Nonetheless, as these methods only decrease the relative error of count data, they still generate a big relative error for the smaller count data.
Privacy first mechanisms enforce the noise satisfydifferential privacy.They attempt to minimize the noisy relative error to improve the utility of data.Nevertheless, the optimal results not always meet the accuracy requirement.In addition, the deviations of mean, variance and MSE are not discussed in these works, which have significant effect on the mining results.

Accuracy-first
Ligett et al. [23] proposed a framework in theory, which takes care of accuracy first.It has a high-level utility while preserving the privacy of empirical risk minimization (ERM).We can utilize it to search the privacy degree space and find the most experienced algorithm which satisfies the requirement of accuracy.While the number of privacy levels searched only produces logarithmic overhead.
To apply the framework proposed by Ligett et al. [23] in practice, Arinjita et al. [24] improved the utility of the publication methods based on Kronecker graphs.Instead of anonymizing the social network as a whole, they anonymize each cluster of the network separately, and combine the sanitized results thereafter.
Kasra et al. [25] extend the scenario of Arinjita et al.'s [24] work.They used their developed error bounds to provide guidelines to calibrate privacy levels to keep filter error within pre-specified bounds.
The most relevant work is the truncated Gaussian mechanism proposed by Liu [26].According to the definition of Privacy-first Barak [17] Publish the marginals of a given data set Sum Kasiviswanathan [18] Prove lower bounds of relative error Count Blum [19] Publish single dimensional data Count Hay [20] Improve performance range of Blum [12] Histogram Xiao [21] Publish multidimensional data Count Li [22] Minimize the relative error of any given query Count Accuracy-first Ligett [23] Find the most experienced algorithm ERM Arinjita [24] Anonymize each cluster of the network separately Social network Kasra [25] Keep filter error within pre-specified bounds Stream Liu [26] Generalized Gaussian (GG) distribution family Sum Our solution Render linear transformation to maintain statistical property Count and Histogram global sensitivity in DP, she demonstrates that the widely used Laplacian distribution is a specific instance of generalized Gaussian (GG) distribution family.She discusses how GG mechanism meets the theoretical requirements of DP under the pre-specified privacy degree, and studies the relationships between GG distribution and Laplacian distribution.In her work, she indicates that the statistical properties could be the same with original data only if the bound is symmetrical.But she has not given the method to make the properties unchanged.
The aforementioned schemes limit the error to a fixed bound and then change the form of noise to meet the privacy requirement of DP.But these methods have the risk of violating DP.

Summary
The privacy and accuracy are always a couple of contradiction in private data publishing.As shown in Table 1, state-of-the-art methods try to solve this contradiction from two aspects.Nonetheless, privacy first schemes can not guarantee enough utility with a certain relative error.Although accuracy first methods can preserve good utility, the noise generated may not conform to Laplacian distribution, and the statistical properties are changed.Therefore, in this paper, our goal is to provide an practical solution to aggregate data with constrained relative error while preserving comprehensive privacy.Specially, we aim to address the following challenges: • How to define the relative error constrained DP in one and two dimension respectively?• How to guarantee the privacy requirement of DP with a relative error constrained noise?• How to maintain the statistical properties (mean, variance and MSE) when we use an error constrained noise?

Notations and preliminaries
This section introduces the basic concepts for solving our problem.In particular, we first give the necessary symbols for our work.Secondly, we describe the preliminaries of DP and its implementation mechanism.Finally, an expansion of DP in two dimension, i.e.Geo-indistinguishability is demonstrated.

Notations
We use several notations to denote the symbols we needed in one and two dimensions respectively in this paper.In one dimension, we use a dataset D, to denote sensitive instances.
To protect sensitive data, data owner utilizes a randomized perturbation mechanism M to generate and publish a perturbed query result Q = M(D).In two dimension, we use p and p to denote the true and perturbed two dimensional data respectively.Table 2 lists the main notations used in this paper.

Differential privacy
DP [8] is a state-of-the-art privacy preservation model which can guarantee the security of indistinguishability. Essentially, it is a noisy perturbation privacy preserving mechanism.By adding perturbation to raw data or statistical results, DP can guarantee that changing a single record's value has minimal effect on the output results.Thus, DP can preserve the privacy of data to be protected, while supporting mining results well.Definition 1 is its formalized form.
Definition 1 ( -DP [8]) Considering two adjacent datasets, Dand D , which have the same admeasurement, but differ one record to be protected.If the random perturbation mechanism M makes every set of results S satisfy the following equation

Pr[M(D) ∈ S] ≤ e × Pr[M(D ) ∈ S],
( where S ⊆ Range(M), Range(M) is the value range of random algorithm M. Pr [•] indicates probability density function (PDF) and represents privacy budget parameter.
A smaller is related with high-level privacy.
The scaling parameter λ is decided by the sensitivity function f and privacy protection intensity : where f is the largest effect of a single record on the statistical results.
For example, consider a dataset whose sensitivity is 1.Based on the concept of DP, the noise (added to the real answer) distributed according to Lap(1/ ) is enough to guarantee -DP.

Geo-indistinguishability
DP can preserve one's privacy strictly with an parameter.But DP and Laplace mechanism can only address one dimensional data.In real-world applications, the data always are high dimension.For example, location data are commonly applied in LBSs and are fundamental type in big data.Based on the idea of "indistinguishability", Andres et al. proposed the notion of "Geo-indistinguishability", to address two dimensional data.Definition 3 (Geo-indistinguishability [27]) For all observations S, a mechanism satisfies -Geo-indistinguishability if: 123 The above definition ensures that points within distance 1 can produce observations with limited probabilities.If two points are farther, the probabilities to produce S are more different.It is very similar as the definition of differential privacy, which requires two databases that differ a single record value to produce the same answer with similar probabilities.
Because the Laplace mechanism can only process one dimensional data, Geo-indistinguishability uses a distribution defined on a plane.In addition, they must utilize straight distance | p −μ| to replace Euclidean plane distance d( p, μ).However, only | p − μ| needs to be replaced by d( p, μ)) in (1) results, leading to the natural expansion of the Laplacian distribution from one dimension to two dimensions.Definition 4 (Planar Laplacian Mechanism [27]) Given the privacy budget ∈ R + , and individual's real location p ∈ R 2 , the PDF of our privacy preserving algorithm, on another point p ∈ R 2 , is: where 2 /2π is a normalization parameter.
They call this function the planar Laplace operator, centered on p.It should be noted that the introduction of planar Laplacian operator on any vertical plane, which passes through its center draws a graph that is proportional on the linear Laplace operator.

RE-DP
In this section, we first propose the definition of RE-DP, which satisfies DP with constrained relative error.Then we demonstrate our implement mechanism based on truncated Laplace distribution, which can guarantee DP while limiting the perturbed error into a fixed bound.Then we make a linear transformation to keep the mean, variance and MSE the same as original data to reserve its utility.

Definition of RE-DP
DP in Definition 1 only can address one dimensional data.As we have discussed in "Introduction" section, the SP may always has requirement on the perturbed error.In this case, the definition of DP is not appropriate.In this section, We propose the notion of RE-DP with relative error constraint.First of all, we give the definition of relative error.
Definition 7 gives the form of noise which could provide bounded relative error constraint.Compared to the standard Laplace distribution, the PDF of truncated Laplace distribution has an extra factor (1 − e −αs z /λ ) −1 .The function of factor is to make the cumulative distribution function of truncated Laplace be 1.Besides, the relative error is limited by the bound.We can implement this noise form in a post-hoc way by discarding the out of range cleaned results of the conventional Laplace mechanism until the inbound value is obtained.Figure 3 is the diagram of PDFs of DP and RE-DP with λ = 1.The differences between them are the limitation on x axis and the peak of RE-DP is higher than that of DP.

Statistical properties
In the above section, we have given the methods to generate noise satisfying DP with relative error constraint.However, the challenging issue is that the statistical properties of the data intended to protect have been changed after applying truncated Laplace mechanism.In this paper, we investigate three commonly used properties in data mining tasks, which are mean, variance and MSE.Since standard Laplacian mechanism is a symmetrical distribution with mean 0, and the methods we used are also symmetrical.Thus, our mechanism does not changed the mean of noise, as shown in Theorem 1.

Theorem 1 The mean of the random variables generated by our truncated Laplacian mechanism is 0.
Proof Assume z is a random variable who conforms to the truncated Laplacian distribution.Then the mean of z is: Let z λ = y, then we have Theorem 1 demonstrates that the mean of our proposed truncated Laplacian mechanism is the same with that of standard Laplacian mechanism.In addition, we have known that the variance of standard Laplacian distribution is 2λ 2 , Theorem 2 demonstrates the variety of variance using our proposed mechanism.
Theorem 2 Given a random variable z that conforms to the truncated Laplacian distribution, the variance of z is: Proof The calculation equation of variance of random variable z is: Let z λ = y, then we have Theorem 2 indicates that the variance of truncated Laplace random variable is decided by the bound α, which may affect quantity of applications.According to the properties of Laplace distribution, a linear transformation of Laplace variables still conforms to Laplace distribution.We utilize this property to generate truncated Laplace noise to make the variance not changed.

Theorem 3 Given a random variable z that conforms to the truncated Laplacian distribution, with a bound of [−γ, γ ].
Denote u = wz, if we want to u is bounded by [−αs z , αs z ] and has the same variance with standard Laplacian distribution, then w, γ and α should satisfy the following two conditions: then the PDF of u is: 123 Then the mean of u is: and the variance E(u 2 ) of random variable u is: Let u λw = y, then we have If w, γ , α satisfy the relationship in Equation ( 14), then E(u 2 ) = 2λ 2 .From Equation ( 14) we can solve that w = αs z /γ .After given a fixed relative bound αs z and privacy degree λ, substitute it into the second equation, the bound value γ of truncated Laplacian distribution can be obtained.
At last, we investigate the MSE after applying our truncated mechanism and linear transformation.The method to obtain a random variable's MSE is: Since the mean E(z) and variance E(z 2 ) of the random variable z do not change compared to the standard Laplacian distribution after applying our truncated mechanism and linear transformation, the MSE can be regarded as an unbiased estimation to standard Laplacian noise.In other words, the MSE of our mechanism is also unchanged compared to that of standard Laplacian mechanism.Algorithm 1 shows the complete working flow of our solution RE-DP.

Algorithm 1 RE-DP
Require: , query result s z , f .Ensure: z, perturbed query results s z . 1

RE-geo-indistinguishability
In last section, we have proposed the notion of RE-DP and its implement mechanism in one dimension.While the two dimensional Laplace noise is also commonly used in real-world applications, e.g., geographic data mining.
In this section, we extend our notion and mechanism to two dimensional case.We first propose the notion of RE-Geo-indistinguishability, to define the two dimensional case with relative error.Then we give the truncated two dimensional noise form, to satisfy the notion of RE-Geoindistinguishability.Finally, we give a linear transformation method to maintain the statistical properties (including mean, variance and MSE).

Definition of RE-geo-indistinguishability
In the mechanism of Geo-indistinguishability, their definition of privacy formalizes an intuitive concept, that is to protect the location of users (typical two-dimensional data) within the radius of r , and the privacy level depends on R, which satisfies DP.The noise conforms to Equation ( 6) can preserve the privacy of Geo-indistinguishability.
The data curators always have accuracy requirement and the r generated by Geo-indistinguishability mechanism has a infinite right bound, which destroys the accuracy of published data.
We first give the formal definition of Geo-indistinguishability with an relative error constraint, as shown in Definition 8. We can see that the forms of Geo-indistinguishability and RE-Geo-indistinguishability are similar.The difference between them is that the relative error r in RE-Geoindistinguishability has been limited by α.

Truncated planar Laplacian mechanism
In this section, we propose a truncated planar Laplacian mechanism to generate two dimensional Laplace noise with an relative error α, to satisfy the definition of RE-Geoindistinguishability.
Definition 9 (Truncated Planar Laplacian Mechanism) Given the parameters ∈ R + and the actual location p ∈ R 2 , the PDF value of our noise mechanism at any other point p ∈ R 2 , is where
Next, we transform the noise form Cartesian coordinate system to polar coordinate system to conduct it in practice conveniently.Theorem 4 demonstrates the noise form in polar coordinate system.

Theorem 4
The PDF of the two dimension Laplace noise with a relative error α is: Proof We call it as a truncated two dimensional Laplace distribution.We proof that the integral of this PDF over the bound gives 1, which means that the PDF in Theorem 4 is the PDF of a probability distribution: Now we notice that the truncated polar coordinate Laplace defined above has a very convenient feature to draw: the angle and radius are independent with each other.That is, we can use two margins to express the PDF.Indeed, if we use r (radius) and θ (angle) to represent these two random variables.The two margins are: Hence we have D (r , θ) = D ,R (r )D , (θ ).D ,R (r ) indicates the PDF of truncated gamma distribution whose shape and scale are 2 and 1/ respectively with a boundary αd( p, o).Since and R are independent with each other, it is necessary to generate r and θ from D ,R (r ) and D ,R (r ) respectively if we want to generate a point (r , θ) from D (r , θ).
The drawing method of truncated gamma distribution is different with that of truncated Laplace distribution.We first consider the cumulative function C (r ) of D ,R (r ): We draw a random variable ρ that conforms to uniform distribution in the interval [0, 1), and set r = C −1 (ρ).Then the random number ρ that exceeds α will be throw out and the number falls in the range (0, αd( p, o)] conforms to the distribution of D ,R (r ).

Statistical properties
The d( p, p ) in Equation (21) indicates the distance between p and p , then the PDF of truncated planar Laplacian distribution in Equation ( 21) equals to the following form: where Since x and y are symmetrical around the true position.Thus, the mean of the two dimensional PDF does not change, which is the same as that of Geo-indistinguishability.Next we investigate the change of variance and MSE in terms of our truncated planar Laplacian mechanism.
Let p,o) ] , then the Equation ( 21) can be transformed to Then the variance of D (r , θ) is where γ (•) is the incomplete gamma function.Equation (29) indicates that the variance of truncated gamma random variable is changed.According to the properties of gamma distribution, a linear transformation of gamma distribution still conforms to gamma distribution.We utilize this property to generate truncated noise to make the variance not changed.
equals to the variance of random variable r in Geoindistinguishability.
then the variance of R is Similar with that case in one dimension, since the mean of truncated planar Laplacian mechanism does not have deviation, and the variance of the random variable R do not change after applying our truncated mechanism and linear transformation, the MSE can be regarded as an unbiased estimation to standard Laplacian noise.In other words, the MSE of our mechanism is also unchanged compared to that of Geoindistinghuishability mechanism.Algorithm 2 shows the working flow of our solution RE-Geo-indistinghuishability.

Experimental evaluation
In this section, we evaluate the performance of the proposed solution on multiple real datasets.Specifically, we aim to explore the answers of the following questions: • How does our solution performance on the statistical properties, including mean, variance and MSE on different datasets?Since our proposed RE-DP and RE-Geo-indistinguishability mechanisms can keep the statistical properties not changed, while limiting relative error in the fixed range.In sections "Mean", "Variance" and "MSE", we will evaluate the mean, variance and MSE of our proposal and compare the results with current representative approaches.• How much is the utility superiority of our solution on the data in terms of polytype queries?
We use the index (α,β)-accuracy to evaluate the performance of data utility on multiple applications.In section "(α,β)-accuracy", we evaluate the performance in different types of applications.

Datasets and configuration
We evaluate our proposed solution by experimenting with real datasets.We plan to prove the effectiveness of our solution through four real world datasets in machine learning, social networking and transportation applications.Each experiment was performed 1000 times.
Adult: Thanks to UC Irvine Machine Learning Repository1 , this dataset predicts whether income exceeds 50K/yr based on census data, which is also known as "Census Income" dataset.It total number of instances and attributes are 48,842 and 14 respectively.Social Network [28]: The dataset collected friendship relationships among 32,768 students from an online social networking site.Each student has up to 1678 friends.[29]: The Check-in dataset collects the timestamp, user ID, location and location type from 31,000 social users online and 49,000 Americans offline in Los Angeles, New York.

Check-in
Trajectory: In terms of Geolife project [30], this dataset is consisted of 17,621 trajectories.Its distance is 1,292,951 km and duration is 50,176 h.Tracks contain longitude, latitude, altitude coordinates, and time stamps.Among the four datasets, the adult is a machine learning dataset to test the performance of our solution on machine learning results.Social network is a typical one dimension dataset to evaluate the performance of our RE-DP.Check-in and trajectory are two-dimensional datasets used to measure the performance of our RE-Geo-indistinguishability solution.
To evaluate the effectiveness of our solution, we compare our solution with standard DP and 3 state-of-the-art mechanisms, which are Kasra [26], Arinjita [24], and Kasra [25].Besides the statistical properties, mean, variance and MSE, the data utility was measured by the index, (α,β)-accuracy.

Mean
The mean value of the noise can most intuitively reflect the extent to which the added noise affects the original data.To evaluate the effect of the introduced noise, we measure the mean of the noise introduced by state-of-the-art schemes, including standard DP Geo-indistinguishability. Figure 4 shows the box plot results on four test datasets.We set the privacy budget = 1 and conduct the experiments on all datasets.
As shown in Fig. 4, from the top to the bottom of the box plot is the maximum, upper quartile, median, lower quartile and minimum value respectively.In addition, circle is the abnormal value.From Fig. 4, we observe that the maximum

Variance
This subsection compares the variance of noise on different datasets with current mechanisms.According to Dwork [9], less than 1 will be appropriate for privacy intensity setting, then we will follow this heuristic in the following experiments.To investigate our solution's performance comprehensively, we set the range of privacy budget from 0.1 to 0.9 and vary it with a 0.2 step on four datasets, holding various privacy preserving levels.
Figure 5 also shows the results of variance of noise on different datasets corresponding to standard one and two dimensional DP and other mechanisms.As shown in Fig. 5, we find that our solution has a lower variance of noise than the other schemes on four datasets.Specially, in one dimensional dataset, such as Fig. 5a, when = 0.1, RE-DP has a variance of 85 while the current optimal mechanism Kasra [2020] achieves 124, with an improvement of 31.5%.When

MSE
As shown in Fig. 6, we observe that our solutions are optimal whether in one-dimensional or two-dimensional data.Specif-ically, in one dimension, for example Fig. 7a, when = 0.1, our proposal RE-DP achieves a MSE of 215 while suboptimal mechanism except DP achieves 324, with an improvement by 33.6%.When = 0.9, RE-DP's MSE is 2.6 and outperforms Kasra[2020] by 62.9%.The improvement of our solution can also be observed in two dimension.In Fig. 6c, our mechanism RE-Geo-indistinguishability's MSE outperforms the suboptimal mechanism except for Geo-indistinguishability by 49.8% when = 0.1 and 20.0% when = 0.9.These results support the conclusion that our solution can keep the MSE at a small level either in one or two dimensional data.Our proposed mechanisms perform better because they keep the statistical properties of the noise, including the mean, variance not changed and limit the noise error to a fixed bound.
Figure 6 also shows the effect of our solution compared with standard DP and Geo-indistinguishability.For example, in adult dataset, the MSE of standard DP is when = 0.1, which is close to the MSE, 215 of RE-DP.This case is similar in two dimensional dataset.In Check-in dataset, the MSE of Geo-indistinguishability is 298 when = 0.1, while that of our mechanism is 213.The same cases can also be found in Fig. 6b, d.These results demonstrate that our solution has closest statistical error to standard one and two dimensional DP, compared with current methods.This indicates that our solution can preserve a better data utility in multiple realworld applications.

(˛,ˇ)-accuracy
Figure 7 shows the utility performance, (α,β)-accuracy, under privacy preserving setting = 1.We conclude that our solution has much lower β in experimental one and two dimensional dataset, i.e., with higher probability 1 − β under the same α and , than current mechanisms.Therefore, as privacy preserving level increases, the accuracy of our solution will increase.When α = 10, = 1, the value of 1−β for our solution drops to nearly 0 while that values for other methods are not 0 in terms of this probability.Similarly, from Fig. 7, we can see that under the same utility constraint, our solution can also provide significantly better privacy guarantee than the current mechanisms.Therefore, our solution shows significant utility and privacy superiority over state-of-the-art approaches.

Summary for the experimental analysis
Experimental evaluation in sections "Mean", "Variance", "MSE" and "(α,β)-accuracy" supports the following conclusions: • Our solution can keep the statistical properties, including mean, variance and MSE, close to the standard one dimension DP and two dimension Geo-indistinguishability mechanisms, indicating that our solution has a good performance on the statistical properties while limit the noise error to a fixed range.
• Compared with state-of-the-art schemes, our solution performs best on statistical properties, demonstrating the effectiveness of our solution.• Our solution retains significant utility gains with the same privacy budget compared with the existing approaches.Therefore, our solution can obtain a good trade-off between privacy protecting and data utility by selecting a appropriate privacy budget .

Conclusions and future works
Differential privacy provides a good trade-off between privacy preserving and data utility.For this reason, an emerging consensus around its applications and possible extensions in the academic institution and privacy community is being pursued.However, DP has no constraint on the relative error of the noise, leading to disruption of data availability.Stateof-the-art schemes attempt to improve data utility while preserving DP or design noise that satisfies DP while limiting the relative error to a fixed bound.But they do not consider the effect of their methods on the statistical properties.
In this paper, we first give two definitions of DP with limited relative error of noise in one and two dimension respectively.Then we propose the corresponding mechanisms, to realize the definitions in practice.Furthermore, we utilize linear transformation of Laplace and Gamma distributions to maintain the statistical properties while limiting the relative error to a fixed bound.Experimental evaluation demonstrates that our mechanism outperforms current schemes in terms of security and utility for numbers of frequent items.
The limitation of our work mainly includes two parts.The first is that our work can only adapt count query or histogram scenarios.One of our future works is to explore the possibility of extending our method to complex intelligent algorithms.Other interesting extension of our work would be the need of mechanisms to design corresponding improved algorithms according to the needs of practical applications.

Fig. 1
Fig. 1 Effect of true and perturbed positions on the mining results

Fig. 2
Fig. 2 Probability density function of random algorithm M on the statistical output of D and D Figure 2 shows the PDF of random algorithm M on the statistical output of D and D .Privacy budget is mainly limited by random algorithm M. Laplace mechanism is usually used to realize M. The Laplace mechanism is defined as follows.Definition 2 (Laplacian Mechanism [4,9]) Let f (•) be the statistical function of the output result.The noisy samples Z ∼ Lap(λ) obeying Laplacian distribution can ensure the random perturbed result M(D) = f (D) + Z satisfy -DP, where λ is the scale of Laplacian distribution.The PDF of Laplacian distribution is formalized by the following formula

Definition 5 (
Relative Error) Denote z as the random noise corresponding to the query result s z , where s z ∈ S, then z/s z is denoted as the relative error, limited by the bound α, i.e. |z/s z | ≤ α, where z is generated with -RE-DP.Then we can say that the relative error of z is α.

Definition 8 (
RE-Geo-indistinguishability) For all observations S, the mechanism satisfies RE-Geo-indistinguishability if and only if: Pr(S| p) Pr(S| p ) ≤ e er ∀r > 0∀ p, p : d( p, p ) d( p, o) ≤ α, (20) where d( p, p ) and d( p, o) are Euclidean distances of p between the perturbed location p and coordinate origin o in Cartesian coordinate system.

Fig. 4
Fig. 4 Mean of noise on different datasets.a Adult.b Social Network.c Check-in.d Trajectory

Fig. 5
Fig. 5 Variance of noise on different datasets.a Adult.b Social Network.c Check-in.d Trajectory

Fig. 6
Fig. 6 MSE of noise on different datasets.a Adult.b Social Network.c Check-in.d Trajectory

Table 1
Summary of related work