GeoInformatica

, Volume 19, Issue 4, pp 747–798

The CASE histogram: privacy-aware processing of trajectory data using aggregates

  • Maryam Fanaeepour
  • Lars Kulik
  • Egemen Tanin
  • Benjamin I. P. Rubinstein
Article

DOI: 10.1007/s10707-015-0228-8

Cite this article as:
Fanaeepour, M., Kulik, L., Tanin, E. et al. Geoinformatica (2015) 19: 747. doi:10.1007/s10707-015-0228-8

Abstract

Due to the high uptake of location-based services (LBSs), large spatio-temporal datasets of moving objects’ trajectories are being created every day. An important task in spatial data analytics is to service range queries by returning trajectory counts within a queried region. The question of how to keep an individual user’s data private whilst enabling spatial data analytics by third parties has become an urgent research direction. Indeed, it is increasingly becoming a concern for users. To preserve privacy we discard individual trajectories and aggregate counts over a spatial and temporal partition. However the privacy gained comes at a cost to utility: trajectories passing through multiple cells and re-entering a query region, lead to inaccurate query responses. This is known as the distinct counting problem. We propose the Connection Aware Spatial Euler (CASE) histogram to address this long-standing problem. The CASE histogram maintains the connectivity of a moving object path, but does not require the ID of an object to distinguish multiple entries into an arbitrary query region. Our approach is to process trajectories offline into aggregate counts which are sent to third parties, rather than the original trajectories. We also explore modifications of our aggregate counting approach that preserve differential privacy. Theoretically and experimentally we demonstrate that our method provides a high level of accuracy compared to the best known methods for the distinct counting problem, whilst preserving privacy. We conduct our experiments on both synthetic and real datasets over two competitive Euler histogram-based methods presented in the literature. Our methods enjoy improvements to accuracy from 10 % up to 70 % depending on trip data and query region size, with the greatest increase seen on the Microsoft T-Drive real dataset, representing a more than tripling of accuracy.

Keywords

Aggregate data Count information Differential privacy Distinct counting problem Euler histograms Location privacy Spatial databases Spatial data analytics 

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Department of Computing and Information SystemsUniversity of MelbourneParkvilleAustralia
  2. 2.National ICT Australia (NICTA)SydneyAustralia

Personalised recommendations