GeoInformatica

, Volume 19, Issue 4, pp 747–798 | Cite as

The CASE histogram: privacy-aware processing of trajectory data using aggregates

  • Maryam Fanaeepour
  • Lars Kulik
  • Egemen Tanin
  • Benjamin I. P. Rubinstein
Article

Abstract

Due to the high uptake of location-based services (LBSs), large spatio-temporal datasets of moving objects’ trajectories are being created every day. An important task in spatial data analytics is to service range queries by returning trajectory counts within a queried region. The question of how to keep an individual user’s data private whilst enabling spatial data analytics by third parties has become an urgent research direction. Indeed, it is increasingly becoming a concern for users. To preserve privacy we discard individual trajectories and aggregate counts over a spatial and temporal partition. However the privacy gained comes at a cost to utility: trajectories passing through multiple cells and re-entering a query region, lead to inaccurate query responses. This is known as the distinct counting problem. We propose the Connection Aware Spatial Euler (CASE) histogram to address this long-standing problem. The CASE histogram maintains the connectivity of a moving object path, but does not require the ID of an object to distinguish multiple entries into an arbitrary query region. Our approach is to process trajectories offline into aggregate counts which are sent to third parties, rather than the original trajectories. We also explore modifications of our aggregate counting approach that preserve differential privacy. Theoretically and experimentally we demonstrate that our method provides a high level of accuracy compared to the best known methods for the distinct counting problem, whilst preserving privacy. We conduct our experiments on both synthetic and real datasets over two competitive Euler histogram-based methods presented in the literature. Our methods enjoy improvements to accuracy from 10 % up to 70 % depending on trip data and query region size, with the greatest increase seen on the Microsoft T-Drive real dataset, representing a more than tripling of accuracy.

Keywords

Aggregate data Count information Differential privacy Distinct counting problem Euler histograms Location privacy Spatial databases Spatial data analytics 

References

  1. 1.
    Kerckhoffs A (1883) Journal des sciences militaires IX:5–38Google Scholar
  2. 2.
    Barak B, Chaudhuri K, Dwork C, Kale S, McSherry F, Talwar K (2007) Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In: Proceedings of the Twenty-Sixth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 11-13, 2007, Beijing, China, pp 273–282Google Scholar
  3. 3.
    Beigel R, Tanin E (1998) The geometry of browsing. In: LATIN ’98: Theoretical Informatics, Third Latin American Symposium, pp 331–340Google Scholar
  4. 4.
    Beresford AR, Stajano F (2003) Location privacy in pervasive computing. IEEE Pervasive Comput 2(1):46–55CrossRefGoogle Scholar
  5. 5.
    Beresford AR, Stajano F (2004) Mix zones: User privacy in location-aware services. In: 2nd IEEE Conference on Pervasive Computing and Communications Workshops (PerCom 2004 Workshops), pp 127–131Google Scholar
  6. 6.
    Bogorny V, Shekhar S (2010) Spatial and spatio-temporal data mining. In: ICDM 2010, The 10th IEEE International Conference on Data Mining, p 1217Google Scholar
  7. 7.
    Braz F, Orlando S, Orsini R, Raffaetà A, Roncato A, Silvestri C (2007) Approximate aggregations in trajectory data warehouses. In: Proceedings of the 23rd International Conference on Data Engineering Workshops, ICDE 2007, pp 536–545Google Scholar
  8. 8.
    Buchin K, Buchin M, van Kreveld MJ, Löffler M, Luo J, Silveira RI (2012) Processing aggregated data: the location of clusters in health data. GeoInformatica 16 (3):497–521CrossRefGoogle Scholar
  9. 9.
    Chawla S, Dwork C, McSherry F, Talwar K (2005) On the utility of privacy-preserving histograms. In: Proceedings of the 21st Conference on Uncertainty in Artificial IntelligenceGoogle Scholar
  10. 10.
    Chow CY, Mokbel MF (2011) Privacy of spatial trajectories. In: Computing with Spatial Trajectories, pp 109–141Google Scholar
  11. 11.
    Chow CY, Mokbel MF (2011) Trajectory privacy in location-based services and data publication. SIGKDD Explorations 13(1):19–29CrossRefGoogle Scholar
  12. 12.
    Dingledine R, Mathewson N, Syverson PF (2004) Tor: The second-generation onion router. In: Proceedings of the 13th USENIX Security Symposium, August 9-13, 2004, San Diego, CA, USA, pp 303–320Google Scholar
  13. 13.
    Dwork C (2008) Differential privacy: A survey of results. In: Theory and Applications of Models of Computation, 5th International Conference, TAMC 2008, Xi’an, China, April 25-29, 2008. Proceedings, pp 1–19Google Scholar
  14. 14.
    Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006, Proceedings, Lecture Notes in Computer Science, vol 3876, pp 265–284. SpringerGoogle Scholar
  15. 15.
    Dwork C, Naor M, Pitassi T, Rothblum GN, Yekhanin S (2010) Pan-private streaming algorithms. In: Innovations in Computer Science - ICS 2010, Tsinghua University, Beijing, China, January 5-7, 2010. Proceedings, pp 66–80Google Scholar
  16. 16.
    Giannotti F, Nanni M, Pinelli F, Pedreschi D (2007) Trajectory pattern mining. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 330–339Google Scholar
  17. 17.
    Gómez LI, Kuijpers B, Moelans B, Vaisman AA (2011) A state-of-the-art in spatio-temporal data warehousing, OLAP and mining. In: Integrations of Data Warehousing, Data Mining and Database Technologies, pp 200–236Google Scholar
  18. 18.
    Gruteser M, Liu X (2004) Protecting privacy in continuous location-tracking applications. IEEE Secur Priv 2(2):28–34CrossRefGoogle Scholar
  19. 19.
    Jeung H, Yiu ML, Jensen CS (2011) Trajectory pattern mining. In: Computing with Spatial Trajectories, pp 143–177Google Scholar
  20. 20.
    Krumm J (2007) Inference attacks on location tracksGoogle Scholar
  21. 21.
    Leonardi L, Orlando S, Raffaetà A, Roncato A, Silvestri C, Andrienko GL, Andrienko NV (2014) A general framework for trajectory data warehousing and visual OLAP. GeoInformatica 18(2):273– 312CrossRefGoogle Scholar
  22. 22.
    Loo BP (2006) Validating crash locations for quantitative spatial analysis: A GIS-based approach. Accid Anal Prev 38(5):879–886CrossRefGoogle Scholar
  23. 23.
    López IFV, Snodgrass RT, Moon B (2005) Spatiotemporal aggregate computation: a survey. IEEE Trans Knowl Data Eng, TKDE 17(2):271–286CrossRefGoogle Scholar
  24. 24.
    Marketos G, Frentzos E, Ntoutsi I, Pelekis N, Raffaetà A, Theodoridis Y (2008) Building real-world trajectory warehouses. In: Seventh ACM International Workshop on Data Engineering for Wireless and Mobile Access, Mobide 2008, pp 8–15Google Scholar
  25. 25.
    MicrosoftNewsCenter (2011) Data privacy day tackles concerns as location-based services grow in popularity. Accessed: 2013-09-15. http://www.microsoft.com/en-us/news/features/2011/jan11/01-26dataprivacyday.aspx
  26. 26.
    Narayanan A (2009) Data privacy: The non-interactive setting. Ph.D. thesis, Austin, TX, USA. AAI3368859Google Scholar
  27. 27.
    OpenStreetMap The free wiki world map. Accessed: 2013-05-10. http://www.openstreetmap.org/
  28. 28.
    Orlando S, Orsini R, Raffaetà A, Roncato A, Silvestri C (2007) Spatio-temporal aggregations in trajectory data warehouses. In: Data Warehousing and Knowledge Discovery, 9th International Conference, DaWaK 2007, pp 66–77Google Scholar
  29. 29.
    Orlando S, Orsini R, Raffaetà A, Roncato A, Silvestri C (2007) Trajectory data warehouses: Design and implementation issues. J Comput Sci Eng, JCSE 1(2):211–232CrossRefGoogle Scholar
  30. 30.
    Papadias D, Kalnis P, Zhang J, Tao Y (2001) Efficient OLAP operations in spatial data warehouses. In: Advances in Spatial and Temporal Databases, 7th International Symposium, SSTD 2001, pp 443– 459Google Scholar
  31. 31.
    Pedersen TB, Tryfona N (2001) Pre-aggregation in spatial data warehouses. In: Advances in Spatial and Temporal Databases, 7th International Symposium, SSTD 2001, pp 460–480Google Scholar
  32. 32.
    Phillips P, Lee I (2011) Crime analysis through spatial areal aggregated density patterns. GeoInformatica 15(1):49–74CrossRefGoogle Scholar
  33. 33.
    Sakr MA, Güting RH (2011) Spatiotemporal pattern queries. GeoInformatica 15(3):497–540CrossRefGoogle Scholar
  34. 34.
    Samet H (2006) Foundations of multidimensional and metric data structures. Morgan KaufmannGoogle Scholar
  35. 35.
    Sun C, Agrawal D, El Abbadi A (2002) Exploring spatial datasets with histograms. In: Proceedings of the 18th International Conference on Data Engineering, ICDE, pp 93–102Google Scholar
  36. 36.
    Sun C, Bandi N, Agrawal D, El Abbadi A (2006) Exploring spatial datasets with histograms. Distrib Parallel Databases 20(1):57–88CrossRefGoogle Scholar
  37. 37.
    Sweeney L (2002) k-anonymity: A model for protecting privacy. Int J Uncertainty Fuzziness Knowledge Based Syst 10(5):557–570CrossRefGoogle Scholar
  38. 38.
    Tao Y, Kollios G, Considine J, Li F, Papadias D (2004) Spatio-temporal aggregation using sketches. In: Proceedings of the 20th International Conference on Data Engineering, ICDE 2004, pp 214– 225Google Scholar
  39. 39.
    Tao Y, Papadias D, Zhang J (2002) Aggregate processing of planar points. In: Advances in Database Technology - EDBT 2002, 8th International Conference on Extending Database Technology, pp 682– 700Google Scholar
  40. 40.
    Timko I, Böhlen MH, Gamper J (2009) Sequenced spatio-temporal aggregation in road networks. In: EDBT 2009, 12th International Conference on Extending Database Technology, pp 48–59Google Scholar
  41. 41.
    Trudeau R (1993) Introduction to Graph Theory. Dover Books on Mathematics Series. Dover PubGoogle Scholar
  42. 42.
    Viswanathan G, Schneider M (2011) On the requirements for user-centric spatial data warehousing and SOLAP. In: Database Systems for Adanced Applications - 16th International Conference, DASFAA 2011, International Workshops, pp 144–15Google Scholar
  43. 43.
    Wernke M, Skvortsov P, Du̇rr F, Rothermel K (2014) A classification of location privacy attacks and approaches. Pers Ubiquit Comput 18(1):163–175CrossRefGoogle Scholar
  44. 44.
    Willer DJ (1990) A spatial decision support system for bank location: A case study. Tech. rep., University of New York at Buffalo, Department of Geography State, National Center for Geographic Information and AnalysisGoogle Scholar
  45. 45.
    Xie H, Kulik L, Tanin E (2010) Privacy-aware traffic monitoring. IEEE Trans Intell Transp Syst 11(1):61–70CrossRefGoogle Scholar
  46. 46.
    Xie H, Tanin E, Kulik L (2007) Distributed histograms for processing aggregate data from moving objects. In: 8th International Conference on Mobile Data Management (MDM 2007), pp 152– 157Google Scholar
  47. 47.
    Xie H, Tanin E, Kulik L, Scheuermann P, Trajcevski G, Fanaeepour M (2014) Euler histogram tree: A spatial data structure for aggregate range queries on vehicle trajectories. In: 7th ACM SIGSPATIAL International Workshop on Computational Transportation Science, IWCTS 2014Google Scholar
  48. 48.
    Xue AY, Qi J, Xie X, Zhang R, Huang J, Li Y (2015) Solving the data sparsity problem in destination prediction. The International Journal on Very Large Data Bases, VLDB J. 24(2):219–243CrossRefGoogle Scholar
  49. 49.
    Xue AY, Zhang R, Zheng Y, Xie X, Huang J, Xu Z (2013) Destination prediction by sub-trajectory synthesis and privacy protection against such prediction. 29th IEEE International Conference on Data Engineering, ICDE 2013, pp 254–265Google Scholar
  50. 50.
    Xue AY, Zhang R, Zheng Y, Xie X, Huang J, Xu Z (2013) Destination prediction by sub-trajectory synthesis and privacy protection against such prediction. In: 29th IEEE International Conference on Data Engineering, ICDE 2013, pp 254–265Google Scholar
  51. 51.
    Yaagoub A, Liu X, Trajcevski G, Tanin E, Scheuermann P (2012) Materialized views for count aggregates of spatial data. In: Advances in Databases and Information Systems - 16th East European Conference, ADBIS 2012, pp 427–440Google Scholar
  52. 52.
    Yuan J, Zheng Y, Xie X, Sun G (2011) Driving with knowledge from the physical world. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 316–324Google Scholar
  53. 53.
    Yuan J, Zheng Y, Zhang C, Xie W, Xie X, Sun G, Huang Y (2010) T-drive: driving directions based on taxi trajectories. In: 18th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, ACM-GIS 2010, pp 99–108Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Department of Computing and Information SystemsUniversity of MelbourneParkvilleAustralia
  2. 2.National ICT Australia (NICTA)SydneyAustralia

Personalised recommendations