The CASE histogram: privacy-aware processing of trajectory data using aggregates

Fanaeepour, Maryam; Kulik, Lars; Tanin, Egemen; P. Rubinstein, Benjamin I.

doi:10.1007/s10707-015-0228-8

The CASE histogram: privacy-aware processing of trajectory data using aggregates

Published: 14 July 2015

Volume 19, pages 747–798, (2015)
Cite this article

GeoInformatica Aims and scope Submit manuscript

Maryam Fanaeepour ORCID: orcid.org/0000-0003-2654-982X^1,2,
Lars Kulik^1,2,
Egemen Tanin^1,2 &
…
Benjamin I. P. Rubinstein¹

619 Accesses
8 Citations
Explore all metrics

Abstract

Due to the high uptake of location-based services (LBSs), large spatio-temporal datasets of moving objects’ trajectories are being created every day. An important task in spatial data analytics is to service range queries by returning trajectory counts within a queried region. The question of how to keep an individual user’s data private whilst enabling spatial data analytics by third parties has become an urgent research direction. Indeed, it is increasingly becoming a concern for users. To preserve privacy we discard individual trajectories and aggregate counts over a spatial and temporal partition. However the privacy gained comes at a cost to utility: trajectories passing through multiple cells and re-entering a query region, lead to inaccurate query responses. This is known as the distinct counting problem. We propose the Connection Aware Spatial Euler (CASE) histogram to address this long-standing problem. The CASE histogram maintains the connectivity of a moving object path, but does not require the ID of an object to distinguish multiple entries into an arbitrary query region. Our approach is to process trajectories offline into aggregate counts which are sent to third parties, rather than the original trajectories. We also explore modifications of our aggregate counting approach that preserve differential privacy. Theoretically and experimentally we demonstrate that our method provides a high level of accuracy compared to the best known methods for the distinct counting problem, whilst preserving privacy. We conduct our experiments on both synthetic and real datasets over two competitive Euler histogram-based methods presented in the literature. Our methods enjoy improvements to accuracy from 10 % up to 70 % depending on trip data and query region size, with the greatest increase seen on the Microsoft T-Drive real dataset, representing a more than tripling of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mobility Data Analytics with KNOT: The KNime mObility Toolkit

HV: A Feature Based Method for Trajectory Dataset Profiling

A differentially private algorithm for range queries on trajectories

Article 28 October 2020

Notes

This expectation is taken with respect to the random possibilities.

References

Kerckhoffs A (1883) Journal des sciences militaires IX:5–38
Google Scholar
Barak B, Chaudhuri K, Dwork C, Kale S, McSherry F, Talwar K (2007) Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In: Proceedings of the Twenty-Sixth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 11-13, 2007, Beijing, China, pp 273–282
Beigel R, Tanin E (1998) The geometry of browsing. In: LATIN ’98: Theoretical Informatics, Third Latin American Symposium, pp 331–340
Beresford AR, Stajano F (2003) Location privacy in pervasive computing. IEEE Pervasive Comput 2(1):46–55
Article Google Scholar
Beresford AR, Stajano F (2004) Mix zones: User privacy in location-aware services. In: 2nd IEEE Conference on Pervasive Computing and Communications Workshops (PerCom 2004 Workshops), pp 127–131
Bogorny V, Shekhar S (2010) Spatial and spatio-temporal data mining. In: ICDM 2010, The 10th IEEE International Conference on Data Mining, p 1217
Braz F, Orlando S, Orsini R, Raffaetà A, Roncato A, Silvestri C (2007) Approximate aggregations in trajectory data warehouses. In: Proceedings of the 23rd International Conference on Data Engineering Workshops, ICDE 2007, pp 536–545
Buchin K, Buchin M, van Kreveld MJ, Löffler M, Luo J, Silveira RI (2012) Processing aggregated data: the location of clusters in health data. GeoInformatica 16 (3):497–521
Article Google Scholar
Chawla S, Dwork C, McSherry F, Talwar K (2005) On the utility of privacy-preserving histograms. In: Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence
Chow CY, Mokbel MF (2011) Privacy of spatial trajectories. In: Computing with Spatial Trajectories, pp 109–141
Chow CY, Mokbel MF (2011) Trajectory privacy in location-based services and data publication. SIGKDD Explorations 13(1):19–29
Article Google Scholar
Dingledine R, Mathewson N, Syverson PF (2004) Tor: The second-generation onion router. In: Proceedings of the 13th USENIX Security Symposium, August 9-13, 2004, San Diego, CA, USA, pp 303–320
Dwork C (2008) Differential privacy: A survey of results. In: Theory and Applications of Models of Computation, 5th International Conference, TAMC 2008, Xi’an, China, April 25-29, 2008. Proceedings, pp 1–19
Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006, Proceedings, Lecture Notes in Computer Science, vol 3876, pp 265–284. Springer
Dwork C, Naor M, Pitassi T, Rothblum GN, Yekhanin S (2010) Pan-private streaming algorithms. In: Innovations in Computer Science - ICS 2010, Tsinghua University, Beijing, China, January 5-7, 2010. Proceedings, pp 66–80
Giannotti F, Nanni M, Pinelli F, Pedreschi D (2007) Trajectory pattern mining. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 330–339
Gómez LI, Kuijpers B, Moelans B, Vaisman AA (2011) A state-of-the-art in spatio-temporal data warehousing, OLAP and mining. In: Integrations of Data Warehousing, Data Mining and Database Technologies, pp 200–236
Gruteser M, Liu X (2004) Protecting privacy in continuous location-tracking applications. IEEE Secur Priv 2(2):28–34
Article Google Scholar
Jeung H, Yiu ML, Jensen CS (2011) Trajectory pattern mining. In: Computing with Spatial Trajectories, pp 143–177
Krumm J (2007) Inference attacks on location tracks
Leonardi L, Orlando S, Raffaetà A, Roncato A, Silvestri C, Andrienko GL, Andrienko NV (2014) A general framework for trajectory data warehousing and visual OLAP. GeoInformatica 18(2):273– 312
Article Google Scholar
Loo BP (2006) Validating crash locations for quantitative spatial analysis: A GIS-based approach. Accid Anal Prev 38(5):879–886
Article Google Scholar
López IFV, Snodgrass RT, Moon B (2005) Spatiotemporal aggregate computation: a survey. IEEE Trans Knowl Data Eng, TKDE 17(2):271–286
Article Google Scholar
Marketos G, Frentzos E, Ntoutsi I, Pelekis N, Raffaetà A, Theodoridis Y (2008) Building real-world trajectory warehouses. In: Seventh ACM International Workshop on Data Engineering for Wireless and Mobile Access, Mobide 2008, pp 8–15
MicrosoftNewsCenter (2011) Data privacy day tackles concerns as location-based services grow in popularity. Accessed: 2013-09-15. http://www.microsoft.com/en-us/news/features/2011/jan11/01-26dataprivacyday.aspx
Narayanan A (2009) Data privacy: The non-interactive setting. Ph.D. thesis, Austin, TX, USA. AAI3368859
OpenStreetMap The free wiki world map. Accessed: 2013-05-10. http://www.openstreetmap.org/
Orlando S, Orsini R, Raffaetà A, Roncato A, Silvestri C (2007) Spatio-temporal aggregations in trajectory data warehouses. In: Data Warehousing and Knowledge Discovery, 9th International Conference, DaWaK 2007, pp 66–77
Orlando S, Orsini R, Raffaetà A, Roncato A, Silvestri C (2007) Trajectory data warehouses: Design and implementation issues. J Comput Sci Eng, JCSE 1(2):211–232
Article Google Scholar
Papadias D, Kalnis P, Zhang J, Tao Y (2001) Efficient OLAP operations in spatial data warehouses. In: Advances in Spatial and Temporal Databases, 7th International Symposium, SSTD 2001, pp 443– 459
Pedersen TB, Tryfona N (2001) Pre-aggregation in spatial data warehouses. In: Advances in Spatial and Temporal Databases, 7th International Symposium, SSTD 2001, pp 460–480
Phillips P, Lee I (2011) Crime analysis through spatial areal aggregated density patterns. GeoInformatica 15(1):49–74
Article Google Scholar
Sakr MA, Güting RH (2011) Spatiotemporal pattern queries. GeoInformatica 15(3):497–540
Article Google Scholar
Samet H (2006) Foundations of multidimensional and metric data structures. Morgan Kaufmann
Sun C, Agrawal D, El Abbadi A (2002) Exploring spatial datasets with histograms. In: Proceedings of the 18th International Conference on Data Engineering, ICDE, pp 93–102
Sun C, Bandi N, Agrawal D, El Abbadi A (2006) Exploring spatial datasets with histograms. Distrib Parallel Databases 20(1):57–88
Article Google Scholar
Sweeney L (2002) k-anonymity: A model for protecting privacy. Int J Uncertainty Fuzziness Knowledge Based Syst 10(5):557–570
Article Google Scholar
Tao Y, Kollios G, Considine J, Li F, Papadias D (2004) Spatio-temporal aggregation using sketches. In: Proceedings of the 20th International Conference on Data Engineering, ICDE 2004, pp 214– 225
Tao Y, Papadias D, Zhang J (2002) Aggregate processing of planar points. In: Advances in Database Technology - EDBT 2002, 8th International Conference on Extending Database Technology, pp 682– 700
Timko I, Böhlen MH, Gamper J (2009) Sequenced spatio-temporal aggregation in road networks. In: EDBT 2009, 12th International Conference on Extending Database Technology, pp 48–59
Trudeau R (1993) Introduction to Graph Theory. Dover Books on Mathematics Series. Dover Pub
Viswanathan G, Schneider M (2011) On the requirements for user-centric spatial data warehousing and SOLAP. In: Database Systems for Adanced Applications - 16th International Conference, DASFAA 2011, International Workshops, pp 144–15
Wernke M, Skvortsov P, Du̇rr F, Rothermel K (2014) A classification of location privacy attacks and approaches. Pers Ubiquit Comput 18(1):163–175
Article Google Scholar
Willer DJ (1990) A spatial decision support system for bank location: A case study. Tech. rep., University of New York at Buffalo, Department of Geography State, National Center for Geographic Information and Analysis
Xie H, Kulik L, Tanin E (2010) Privacy-aware traffic monitoring. IEEE Trans Intell Transp Syst 11(1):61–70
Article Google Scholar
Xie H, Tanin E, Kulik L (2007) Distributed histograms for processing aggregate data from moving objects. In: 8th International Conference on Mobile Data Management (MDM 2007), pp 152– 157
Xie H, Tanin E, Kulik L, Scheuermann P, Trajcevski G, Fanaeepour M (2014) Euler histogram tree: A spatial data structure for aggregate range queries on vehicle trajectories. In: 7th ACM SIGSPATIAL International Workshop on Computational Transportation Science, IWCTS 2014
Xue AY, Qi J, Xie X, Zhang R, Huang J, Li Y (2015) Solving the data sparsity problem in destination prediction. The International Journal on Very Large Data Bases, VLDB J. 24(2):219–243
Article Google Scholar
Xue AY, Zhang R, Zheng Y, Xie X, Huang J, Xu Z (2013) Destination prediction by sub-trajectory synthesis and privacy protection against such prediction. 29th IEEE International Conference on Data Engineering, ICDE 2013, pp 254–265
Xue AY, Zhang R, Zheng Y, Xie X, Huang J, Xu Z (2013) Destination prediction by sub-trajectory synthesis and privacy protection against such prediction. In: 29th IEEE International Conference on Data Engineering, ICDE 2013, pp 254–265
Yaagoub A, Liu X, Trajcevski G, Tanin E, Scheuermann P (2012) Materialized views for count aggregates of spatial data. In: Advances in Databases and Information Systems - 16th East European Conference, ADBIS 2012, pp 427–440
Yuan J, Zheng Y, Xie X, Sun G (2011) Driving with knowledge from the physical world. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 316–324
Yuan J, Zheng Y, Zhang C, Xie W, Xie X, Sun G, Huang Y (2010) T-drive: driving directions based on taxi trajectories. In: 18th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, ACM-GIS 2010, pp 99–108

Download references

Acknowledgments

We thank Dr. Hairuo Xie for his helpful comments regarding the experimental study.

Author information

Authors and Affiliations

Department of Computing and Information Systems, University of Melbourne, Parkville, VIC, 3010, Australia
Maryam Fanaeepour, Lars Kulik, Egemen Tanin & Benjamin I. P. Rubinstein
National ICT Australia (NICTA), Sydney, Australia
Maryam Fanaeepour, Lars Kulik & Egemen Tanin

Authors

Maryam Fanaeepour
View author publications
You can also search for this author in PubMed Google Scholar
Lars Kulik
View author publications
You can also search for this author in PubMed Google Scholar
Egemen Tanin
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin I. P. Rubinstein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maryam Fanaeepour.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fanaeepour, M., Kulik, L., Tanin, E. et al. The CASE histogram: privacy-aware processing of trajectory data using aggregates. Geoinformatica 19, 747–798 (2015). https://doi.org/10.1007/s10707-015-0228-8

Download citation

Received: 31 January 2014
Revised: 15 March 2015
Accepted: 22 April 2015
Published: 14 July 2015
Issue Date: October 2015
DOI: https://doi.org/10.1007/s10707-015-0228-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The CASE histogram: privacy-aware processing of trajectory data using aggregates

Abstract

Access this article

Similar content being viewed by others

Mobility Data Analytics with KNOT: The KNime mObility Toolkit

HV: A Feature Based Method for Trajectory Dataset Profiling

A differentially private algorithm for range queries on trajectories

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The CASE histogram: privacy-aware processing of trajectory data using aggregates

Abstract

Access this article

Similar content being viewed by others

Mobility Data Analytics with KNOT: The KNime mObility Toolkit

HV: A Feature Based Method for Trajectory Dataset Profiling

A differentially private algorithm for range queries on trajectories

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation