Differentially private counting of users’ spatial regions

Abstract

Mining of spatial data is an enabling technology for mobile services, Internet-connected cars and the Internet of Things. But the very distinctiveness of spatial data that drives utility can cost user privacy. Past work has focused upon points and trajectories for differentially private release. In this work, we continue the tradition of privacy-preserving spatial analytics, focusing not on point or path data, but on planar spatial regions. Such data represent the area of a user’s most frequent visitation—such as “around home and nearby shops”. Specifically we consider the differentially private release of data structures that support range queries for counting users’ spatial regions. Counting planar regions leads to unique challenges not faced in existing work. A user’s spatial region that straddles multiple data structure cells can lead to duplicate counting at query time. We provably avoid this pitfall by leveraging the Euler characteristic for the first time with differential privacy. To address the increased sensitivity of range queries to spatial region data, we calibrate privacy-preserving noise using bounded user region size and a constrained inference that uses robust least absolute deviations. Our novel constrained inference reduces noise and promotes covertness by (privately) imposing consistency. We provide a full end-to-end theoretical analysis of both differential privacy and high-probability utility for our approach using concentration bounds. A comprehensive experimental study on several real-world datasets establishes practical validity.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Notes

  1. 1.

    We use body and region interchangeably to refer to a user’s spatial area. We use the term body to distinguish query regions from users’ regions

  2. 2.

    In the literature, the terms multiple, double or distinct counting are used interchangeably. We suggest the term “duplicate” as it conveys that objects are over-counted.

  3. 3.

    This paper extends our ICDM’2016 conference paper [18].

References

  1. 1.

    Ács G, Castelluccia C, Chen R (2012) Differentially private histogram publishing through lossy compression. In: ICDM’12, pp 1–10

  2. 2.

    Andrés ME, Bordenabe NE, Chatzikokolakis K, Palamidessi C (2013) Geo-indistinguishability: differential privacy for location-based systems. In: CCS’13, pp 901–914

  3. 3.

    Barak B, Chaudhuri K, Dwork C, Kale S, McSherry F, Talwar K (2007) Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In: Proceedings of the twenty-sixth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, June 11–13, 2007, Beijing, China, pp 273–282

  4. 4.

    Beigel R, Tanin E (1998) The geometry of browsing. In: LATIN ’98: theoretical informatics, third latin American symposium, pp 331–340

  5. 5.

    Beresford AR, Stajano F (2003) Location privacy in pervasive computing. IEEE Pervasive Comput 2(1):46–55

    Article  Google Scholar 

  6. 6.

    Braz F, Orlando S, Orsini R, Raffaetà A, Roncato A, Silvestri C (2007) Approximate aggregations in trajectory data warehouses. In: Proceedings of the 23rd international conference on data engineering workshops, ICDE 2007, pp 536–545

  7. 7.

    Chawla S, Dwork C, McSherry F, Talwar K (2005) On the utility of privacy-preserving histograms. In: Proceedings of the 21st conference on uncertainty in artificial intelligence

  8. 8.

    Chen R, Fung BCM, Desai BC, Sossou NM (2012) Differentially private transit data publication: a case study on the Montreal transportation system. In: KDD’12, pp 213–221

  9. 9.

    Chow CY, Mokbel MF (2011a) Privacy of spatial trajectories. In: Zheng Y, Zhou X (eds) Computing with spatial trajectories. Springer, New York, pp 109–141

  10. 10.

    Chow C-Y, Mokbel MF (2011) Trajectory privacy in location-based services and data publication. SIGKDD Explor 13(1):19–29

    Article  Google Scholar 

  11. 11.

    Cormode G, Procopiuc CM, Srivastava D, Shen E, Yu T (2012) Differentially private spatial decompositions. In: ICDE’12, pp 20–31

  12. 12.

    Dielman TE (2005) Least absolute value regression: recent contributions. J Stat Comput Simul 75(4):263–286

    MathSciNet  Article  MATH  Google Scholar 

  13. 13.

    Dwork C (2008) Differential privacy: a survey of results. In: Theory and applications of models of computation, 5th international conference, TAMC, pp 1–19

  14. 14.

    Dwork C (2011) A firm foundation for private data analysis. Commun ACM 54(1):86–95

    Article  Google Scholar 

  15. 15.

    Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Theory of cryptography, third theory of cryptography conference, TCC, vol 3876. Lecture notes in computer science, Springer, Berlin, pp 265–284

  16. 16.

    Fan L, Xiong L, Sunderam VS (2013) Differentially private multi-dimensional time series release for traffic monitoring. In: IFIP’13. Proceedings, pp 33–48

  17. 17.

    Fanaeepour M, Kulik L, Tanin E, Rubinstein BIP (2015) The CASE histogram: privacy-aware processing of trajectory data using aggregates. GeoInformatica 19(4):747–798

    Article  Google Scholar 

  18. 18.

    Fanaeepour M, Rubinstein BIP (2016) Beyond points and paths: counting private bodies. In: ICDM, pp 131–140

  19. 19.

    Ghinita G (2013) Privacy for location-based services. In: Bertino E, Sandhu R (eds) Synthesis lectures on information security, privacy, and trust. Morgan & Claypool Publishers, San Rafael

  20. 20.

    Gruteser M, Liu X (2004) Protecting privacy in continuous location-tracking applications. IEEE Secur Priv 2(2):28–34

    Article  Google Scholar 

  21. 21.

    Gurobi Optimization, Inc. (2015) Gurobi optimizer reference manual. http://www.gurobi.com

  22. 22.

    Hay M, Rastogi V, Miklau G, Suciu D (2010) Boosting the accuracy of differentially private histograms through consistency. PVLDB 3(1):1021–1032

    Google Scholar 

  23. 23.

    He X, Cormode G, Machanavajjhala A, Procopiuc CM, Srivastava D (2015) DPT: differentially private trajectory synthesis using hierarchical reference systems. PVLDB 8(11):1154–1165

    Google Scholar 

  24. 24.

    Hsu J, Gaboardi M, Haeberlen A, Khanna S, Narayan A, Pierce BC, Roth A (2014) Differential privacy: an economic method for choosing epsilon. In: IEEE 27th computer security foundations symposium, CSF 2014, pp 398–410

  25. 25.

    Iliffe J, Lott R (2008) Datums and map projections for remote sensing, GIS and surveying. Whittles Publishing. https://books.google.com.au/books?id=u_4RAQAAIAAJ

  26. 26.

    Inan A, Kantarcioglu M, Ghinita G, Bertino E (2010) Private record matching using differential privacy. In: EDBT’10, pp 123–134

  27. 27.

    Jones E, Oliphant T, Peterson P et al (2001) SciPy: open source scientific tools for Python. http://www.scipy.org/

  28. 28.

    Karmarkar N (1984) A new polynomial-time algorithm for linear programming. In: STOC’84, pp 302–311

  29. 29.

    Kifer D, Lin B (2010) Towards an axiomatization of statistical privacy and utility. In: Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS, pp 147–158

  30. 30.

    Krumm J (2007) Inference attacks on location tracks. In: 5th international conference on pervasive computing, PERVASIVE’07, pp 127–143

  31. 31.

    Krumm J (2009) A survey of computational location privacy. Pers Ubiquitous Comput 13(6):391–399

    Article  Google Scholar 

  32. 32.

    Leonardi L, Orlando S, Raffaetà A, Roncato A, Silvestri C, Andrienko GL, Andrienko NV (2014) A general framework for trajectory data warehousing and visual OLAP. GeoInformatica 18(2):273–312

    Article  Google Scholar 

  33. 33.

    Li C, Hay M, Miklau G, Wang Y (2014) A data- and workload-aware query answering algorithm for range queries under differential privacy. PVLDB 7(5):341–352

    Google Scholar 

  34. 34.

    López IFV, Snodgrass RT, Moon B (2005) Spatiotemporal aggregate computation: a survey. IEEE Trans Knowl Data Eng TKDE 17(2):271–286

    Article  Google Scholar 

  35. 35.

    Marketos G, Frentzos E, Ntoutsi I, Pelekis N, Raffaetà A, Theodoridis Y (2008) Building real-world trajectory warehouses. In: Seventh ACM international workshop on data engineering for wireless and mobile access, Mobide 2008, pp 8–15

  36. 36.

    Mir DJ, Isaacman S, Cáceres R, Martonosi M, Wright RN (2013) DP-WHERE: differentially private modeling of human mobility. In: Proceedings of the 2013 IEEE international conference on big data, pp 580–588

  37. 37.

    Papadias D, Kalnis P, Zhang J, Tao Y (2001) Efficient OLAP operations in spatial data warehouses. In: 7th international symposium on advances in spatial and temporal databases, SSTD’01, pp 443–459

  38. 38.

    Papadias D, Tao Y, Kalnis P, Zhang J (2002) Indexing spatio-temporal data warehouses. In: Proceedings of the 18th international conference on data engineering, ICDE’02, pp 166–175

  39. 39.

    Piorkowski M, Sarafijanovoc-Djukic N, Grossglauser M (2009) A parsimonious model of mobile partitioned networks with clustering. In: COMSNETS. http://www.comsnets.org

  40. 40.

    Primault V, Mokhtar SB, Lauradoux C, Brunie L (2014) Differentially private location privacy in practice. CoRR abs/1410.7744

  41. 41.

    Qardaji WH, Yang W, Li N (2013) Differentially private grids for geospatial data. In: ICDE’13, pp 757–768

  42. 42.

    Rubinstein BIP, Bartlett PL, Huang L, Taft N (2012) Learning in a large function space: privacy-preserving mechanisms for SVM learning. J Priv Confid 4(1):65–100

    Google Scholar 

  43. 43.

    Sun C, Agrawal D, El Abbadi A (2002a) Exploring spatial datasets with histograms. In: Proceedings of the 18th international conference on data engineering, ICDE, pp 93–102

  44. 44.

    Sun C, Agrawal D, El Abbadi A (2002b) Selectivity estimation for spatial joins with geometric selections. In: EDBT’02, pp 609–626

  45. 45.

    Sun C, Bandi N, Agrawal D, El Abbadi A (2006) Exploring spatial datasets with histograms. Distrib Parallel Databases 20(1):57–88

    Article  Google Scholar 

  46. 46.

    Tao Y, Kollios G, Considine J, Li F, Papadias D (2004) Spatio-temporal aggregation using sketches. In: Proceedings of the 20th international conference on data engineering, ICDE 2004, pp 214–225

  47. 47.

    Tao Y, Papadias D, Zhang J (2002) Aggregate processing of planar points. In: 8th international conference on extending database technology, EDBT 2002, pp 682–700

  48. 48.

    Timko I, Böhlen MH, Gamper J (2009) , Sequenced spatio-temporal aggregation in road networks. In: EDBT 2009, 12th international conference on extending database technology, pp 48–59

  49. 49.

    To H, Ghinita G, Shahabi C (2014) A framework for protecting worker location privacy in spatial crowdsourcing. PVLDB 7(10):919–930

    Google Scholar 

  50. 50.

    Trudeau R (1993) Introduction to graph theory. Dover books on mathematics series. Dover Publications, New York

    Google Scholar 

  51. 51.

    Wang M, Zhang X, Meng X (2013) , DiffR-tree: a differentially private spatial index for OLAP query. In: WAIM’13, pp 705–716

  52. 52.

    Xie H, Tanin E, Kulik L (2007) Distributed histograms for processing aggregate data from moving objects. In: 8th international conference on mobile data management (MDM 2007), pp 152–157

  53. 53.

    Xie H, Tanin E, Kulik L, Scheuermann P, Trajcevski G, Fanaeepour M (2014) Euler histogram tree: a spatial data structure for aggregate range queries on vehicle trajectories. In: 7th ACM SIGSPATIAL international workshop on computational transportation science, IWCTS 2014

  54. 54.

    Yuan J, Zheng Y, Zhang C, Xie W, Xie X, Sun G, Huang Y (2010) T-drive: driving directions based on taxi trajectories. In: 18th ACM SIGSPATIAL international symposium on advances in geographic information systems, ACM-GIS 2010, pp 99–108

  55. 55.

    Zhang J, Ghinita G, Chow C (2014) Differentially private location recommendations in geosocial networks. In: MDM’14, pp 59–68

  56. 56.

    Zheng Y, Xie X, Ma W (2010) Geolife: a collaborative social networking service among user, location and trajectory. IEEE Data Eng Bull 33(2):32–39

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by Australian Research Council DECRA Grant DE160100584.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Maryam Fanaeepour.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Fanaeepour, M., Rubinstein, B.I.P. Differentially private counting of users’ spatial regions. Knowl Inf Syst 54, 5–32 (2018). https://doi.org/10.1007/s10115-017-1113-6

Download citation

Keywords

  • Differential privacy
  • Euler histograms
  • Location privacy
  • Spatial regions