Personal and Ubiquitous Computing

, Volume 22, Issue 2, pp 333–344 | Cite as

Data provenance to audit compliance with privacy policy in the Internet of Things

  • Thomas PasquierEmail author
  • Jatinder Singh
  • Julia Powles
  • David Eyers
  • Margo Seltzer
  • Jean Bacon
Original Article


Managing privacy in the IoT presents a significant challenge. We make the case that information obtained by auditing the flows of data can assist in demonstrating that the systems handling personal data satisfy regulatory and user requirements. Thus, components handling personal data should be audited to demonstrate that their actions comply with all such policies and requirements. A valuable side-effect of this approach is that such an auditing process will highlight areas where technical enforcement has been incompletely or incorrectly specified. There is a clear role for technical assistance in aligning privacy policy enforcement mechanisms with data protection regulations. The first step necessary in producing technology to accomplish this alignment is to gather evidence of data flows. We describe our work producing, representing and querying audit data and discuss outstanding challenges.



This work was supported by the US National Science Foundation under grant SSI-1450277 End-to-End Provenance, and the UK Engineering and Physical Sciences Research Council grant EP/K011510 CloudSafetyNet: End-to-end application security in the cloud. Cambridge authors acknowledge the support of Microsoft through the Microsoft Cloud Computing Research Centre.


  1. 1.
    Overview of the Internet of Things. Tech. Rep. (2012) Y.2060 ITU telecommunication standardization sectorGoogle Scholar
  2. 2.
    Akoush S, Sohan R, Hopper A (2013) HadoopProv: towards Provenance as a First Class Citizen in MapReduce. In: Workshop on the theory and practice of provenance (TaPP’13). USENIXGoogle Scholar
  3. 3.
    Amir-Mohammadian S, Chong S, Skalka C (2016) Correct audit logging: theory and practice. In: International conference on principles of security and trust (POST’16). SpringerGoogle Scholar
  4. 4.
    Armbrust M, Das T, Davidson A, Ghodsi A, Or A, Rosen J, Stoica I, Wendell P, Xin R, Zaharia M (2015) Scaling Spark in the real world: performance and usability. International Conference on Very Large Data Bases (VLDB) 8(12):1840–1843Google Scholar
  5. 5.
    Bacon J, Moody K (2002) Toward open, secure, widely distributed services. Commun ACM 45(6):59–64CrossRefGoogle Scholar
  6. 6.
    Bacon J, Singh J, Trossen D, Pavel D, Vastardis N, Yang AB, Pennington K, Clarke S, Jones SG (2012) Personal and social communication services for health and lifestyle monitoring. In: Proceedings 1st international conference on global health challenges (Global Health 2012), with IARIA Datasys, Venice, p 2012Google Scholar
  7. 7.
    Barbieri DF, Braga D, Ceri S, VALLE ED, Grossniklaus M (2010) C-SPARQL: a continuous query language for RDF data streams. Int J Semantic Comput 4(01):3–25CrossRefzbMATHGoogle Scholar
  8. 8.
    Bates A, Butler K, Moyer T (2015) Take only what you need: leveraging mandatory access control policy to reduce provenance storage costs. In: Workshop on theory and practice of provenance. USENIX, pp 7–7Google Scholar
  9. 9.
    Bates A, Mood B, Valafar M, Butler K (2013) Towards secure provenance-based access control in cloud environments. In: Conference on data and application security and privacy. ACM, pp 277–284Google Scholar
  10. 10.
    Bates A, Tian D, Butler K, Moyer T (2015) Trustworthy whole-system provenance for the Linux kernel. In: Security symposium. USENIXGoogle Scholar
  11. 11.
    Braun U, Garfinkel S, Holland DA, Muniswamy-Reddy KK, Seltzer MI (2006) Issues in automatic provenance collection. In: Provenance and annotation of data. Springer, pp 171–183Google Scholar
  12. 12.
    Braun U, Shinnar A, Seltzer MI (2008) Securing provenance. In: Summit on hot topics in security (HotSec’08). USENIXGoogle Scholar
  13. 13.
    Cadenhead T, Khadilkar V, Kantarcioglu M, Thuraisingham B (2011) A language for provenance access control. In: Conference on data and application security and privacy. ACM, pp 133–144Google Scholar
  14. 14.
    Carata L, Akoush S, Balakrishnan N, Bytheway T, Sohan R, Selter M, Hopper A (2014) A primer on provenance. Commun ACM 57(5):52–60CrossRefGoogle Scholar
  15. 15.
    Chaudhry A, Crowcroft J, Howard H, Madhavapeddy A, Mortier R, Haddadi H, McAuley D (2015) Personal data: thinking inside the box. In: Proceedings of the fifth decennial Aarhus conference on critical alternatives. Aarhus University Press, pp 29–32Google Scholar
  16. 16.
    Chen P, Evans T, Plale B (2016) Analysis of memory constrained live provenance. In: International provenance and annotation workshop. Springer, pp 42–54Google Scholar
  17. 17.
    Coker G, Guttman J, Loscocco P, Herzog A, Millen J, O’Hanlon B, Ramsdell J, Segall A, Sheehy J, Sniffen B (2011) Principles of remote attestation. Int J Inf Secur 10(2):63–81CrossRefGoogle Scholar
  18. 18.
    Crawl D, Wang J, Altintas I (2011) Provenance for mapreduce-based data-intensive workflows. In: Workshop on workflows in support of large-scale science. ACM, pp 21–30Google Scholar
  19. 19.
    Curbera F, Doganata Y, Martens A, Mukhi NK, Slominski A (2008) Business provenance–a technology to increase traceability of end-to-end operations. In: On the move to meaningful internet systems: OTM 2008. Springer, pp 100–119Google Scholar
  20. 20.
    Edwards A, Jaeger T, Zhang X (2002) Runtime verification of authorization hook placement for the Linux security modules framework. In: Conference on computer and communications security (CCS). ACM, pp 225–234Google Scholar
  21. 21.
    Flittner M, Balaban S, Bless R (2016) Cloudinspector: A transparency-as-a-service solution for legal issues in cloud computing. In: IC2E international workshop on legal and technical issues in cloud computing (CLaw’16). IEEEGoogle Scholar
  22. 22.
    Fu Q, Zhu J, Hu W, Lou JG, Ding R, Lin Q, Zhang D, Xie T (2014) Where do developers log? an empirical study on logging practices in industry. In: International conference on software engineering (ICSE). ACM, pp 24–33Google Scholar
  23. 23.
    Ganapathy V, Jaeger T, Jha S (2005) Automatic placement of authorization hooks in the Linux security modules framework. In: Conference on computer and communications security (CCS). ACM, pp 330–339Google Scholar
  24. 24.
    Gehani A, Tariq D (2012) Spade: Support for provenance auditing in distributed environments. In: Middleware conference. IEEE/ACM/IFP/USENIX, pp 101–120Google Scholar
  25. 25.
    Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) Powergraph: distributed graph-parallel computation on natural graphs. In: Symposium on operating systems design and implementation (OSDI’12). USENIX, p 2Google Scholar
  26. 26.
    Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) Graphx: graph processing in a distributed dataflow framework. In: Symposium on operating systems design and implementation (OSDI’14), vol 14, pp 599–613Google Scholar
  27. 27.
    Gubbi J, Buyya R, Marusic S, Palaniswami M (2013) Internet of Things (IoT): A vision, architectural elements, and future directions. Futur Gener Comput Syst 29(7):1645–1660CrossRefGoogle Scholar
  28. 28.
    Hayton RJ, Bacon JM, Moody K (1998) Access control in an open distributed environment. In: 1998 IEEE symposium on security and privacy, 1998. Proceedings. IEEE, pp 3–14Google Scholar
  29. 29.
    Hon WK, Millard C, Singh J (2016) Twenty legal considerations for Clouds of Things. Queen Mary School of Law Legal Studies Research Paper (216)Google Scholar
  30. 30.
    Hussein J, Moreau L, Sassone V (2015) Obscuring provenance confidential information via graph transformation. In: IFIP International conference on trust management. Springer, pp 109–125Google Scholar
  31. 31.
    Interlandi M, Shah K, Tetali SD, Gulzar MA, Yoo S, Kim M, Millstein T, Condie T (2015) Titian: data provenance support in Spark. Conference on Very Large Databases (VLDB’15) 9(3): 216–227Google Scholar
  32. 32.
    Jaeger T, Edwards A, Zhang X (2004) Consistency analysis of authorization hook placement in the Linux security modules framework. ACM Trans Inf Syst Secur (TISSEC) 7(2):175–205CrossRefGoogle Scholar
  33. 33.
    Jaeger T, Sailer R, Shankar U (2006) PRIMA: Policy-reduced integrity measurement architecture. In: ACM Symposium on access control models and technologies (SACMAT). ACM, pp 19–28Google Scholar
  34. 34.
    Johnson A, Waye L, Moore S, Chong S (2015) Exploring and enforcing security guarantees via program dependence graphs. In: ACM SIGPLAN notices, vol 50. ACM, pp 291–302Google Scholar
  35. 35.
    Kemmerer RA, Vigna G (2002) Intrusion detection: a brief history and overview. IEEE Computer 35 (4):27–30CrossRefGoogle Scholar
  36. 36.
    Keoh SL, Kumar S, Tschofenig H (2014) Securing the internet of things: a standardization perspective. Internet of Things Journal 1(3):265–275CrossRefGoogle Scholar
  37. 37.
    Kohnstamm J, Madhub D (2014) Mauritius Declaration on the Internet of Things. In: International conference of data protection and privacy commissionersGoogle Scholar
  38. 38.
    Kyrola A, Blelloch GE, Guestrin C, et al. (2012) GraphChi: large-scale graph computation on just a PC. In: Symposium on operating systems design and implementation (OSDI’12), vol 12. USENIX, pp 31–46Google Scholar
  39. 39.
    Lampson BW (2004) Computer security in the real world. IEEE Computer 37(6):37–46CrossRefGoogle Scholar
  40. 40.
    Macko P, Seltzer M (2012) A general-purpose provenance library. In: Workshop on the theory and practice of provenance (TaPP’12). UsenixGoogle Scholar
  41. 41.
    McKinsey Global Institute (2015) The Internet of Things: mapping the value beyond the hypeGoogle Scholar
  42. 42.
    Mineraud J, Mazhelis O, Su X, Tarkoma S (2016) A gap analysis of Internet-of-Things platforms. Comput Commun ACM 89(C): 5–16CrossRefGoogle Scholar
  43. 43.
    Missier P, Belhajjame K, Cheney J (2013) The W3C PROV family of specifications for modelling provenance metadata. In: Conference on extending database technology (EDBT). ACM, pp 773–776Google Scholar
  44. 44.
    Missier P, Bryans J, Gamble C, Curcin V, Danger R (2014) Provabs: model, policy, and tooling for abstracting prov graphs. In: International provenance and annotation workshop. Springer, pp 3–15Google Scholar
  45. 45.
    Moyer T, Gadepally V (2016) High-throughput ingest of data provenance records into Accumulo. In: High performance extreme computing conference (HPEC). IEEE, pp 1–6Google Scholar
  46. 46.
    Neumann T, Weikum G (2010) The RDF-3x engine for scalable management of RDF data. VLDB J 19 (1):91–113CrossRefGoogle Scholar
  47. 47.
    Ni Q, Xu S, Bertino E, Sandhu R, Han W (2009) An access control language for a general provenance model. In: Workshop on secure data management. Springer, pp 68–88Google Scholar
  48. 48.
    Park J, Nguyen D, Sandhu R (2012) A provenance-based access control model. In: Annual international conference on privacy, security and trust. IEEE, pp 137–144Google Scholar
  49. 49.
  50. 50.
    Pasquier T, Eyers D (2016) Information flow audit for transparency and compliance in the handling of personal data. In: IC2E international workshop on legal and technical issues in cloud computing (CLaw’16). IEEEGoogle Scholar
  51. 51.
    Pasquier T, Singh J, Bacon J, Eyers D (2016) Information Flow Audit for PaaS clouds. In: International conference on cloud engineering (IC2E). IEEEGoogle Scholar
  52. 52.
    Pasquier T, Singh J, Eyers D, Bacon J (2015) CamFlow: managed data-sharing for cloud services. IEEE Trans Cloud Comput (TCC)Google Scholar
  53. 53.
    Pohly DJ, McLaughlin S, McDaniel P, Butler K (2012) Hi-fi: collecting high-fidelity whole-system provenance. In: Annual computer security applications conference. ACM, pp 259–268Google Scholar
  54. 54.
    Povey D (1999) Optimistic security: a new access control paradigm. In: Proceedings of the 1999 workshop on new security paradigms. ACM, pp 40–45Google Scholar
  55. 55.
    Roy A, Mihailovic I, Zwaenepoel W (2013) X-stream: edge-centric graph processing using streaming partitions. In: Proceedings of the twenty-fourth ACM symposium on operating systems principles (SOSP). ACM, pp 472–488Google Scholar
  56. 56.
    Sailer R, Zhang X, Jaeger T, Van Doorn L (2004) Design and implementation of a TCG-based integrity measurement architecture. In: USENIX Security symposium, vol 13. USENIX, pp 223–238Google Scholar
  57. 57.
    Sakka MA, Defude B, Tellez J (2010) Document provenance in the cloud: constraints and challenges. In: Networked services and applications-engineering, control and management. Springer, pp 107–117Google Scholar
  58. 58.
    Singh J, Pasquier T, Bacon J, Ko H, Eyers D (2016) Twenty security considerations for cloud-supported Internet of Things. IEEE Internet of Things Journal 3(3):269–284CrossRefGoogle Scholar
  59. 59.
    Singh J, Pasquier T, Bacon J, Powles J, Diaconu R, Eyers D (2016) Big ideas paper: policy-driven middleware for a legally compliant internet of things. In: ACM/IFIP/USENIX middleware. ACMGoogle Scholar
  60. 60.
    Smith M, Szongott C, Henne B, von Voigt G (2012) Big data privacy issues in public social media. In: 2012 6th IEEE international conference on digital ecosystems and technologies (DEST). IEEE, pp 1–6Google Scholar
  61. 61.
    Stolfo SJ, Salem MB, Keromytis AD (2012) Fog computing: mitigating insider data theft attacks in the cloud. In: 2012 IEEE symposium on security and privacy workshops (SPW). IEEE, pp 125–128Google Scholar
  62. 62.
    Takabi H, Joshi J, Ahn G (2010) Security and privacy challenges in cloud computing environments. IEEE Secur Priv 8(6):54–57Google Scholar
  63. 63.
    Vaughan JA, Chong S (2011) Inference of expressive declassification policies. In: 2011 IEEE Symposium on security and privacy. IEEE, pp 180–195Google Scholar
  64. 64.
    Weber RH (2010) Internet of Things–new security and privacy challenges. Computer Law & Security Review 26(1):23–30CrossRefGoogle Scholar
  65. 65.
    Weitzner DJ (2007) Beyond secrecy: new privacy protection strategies for open information spaces. IEEE Internet Comput 11(5):96–95CrossRefGoogle Scholar
  66. 66.
    Weitzner DJ, Abelson H, Berners-Lee T, Feigenbaum J, Hendler J, Sussman GJ (2008) Information accountability. Commun ACM 51(6):82–87CrossRefGoogle Scholar
  67. 67.
    Xie Y, Muniswamy-Reddy KK, Feng D, Li Y, Long DD (2013) Evaluation of a hybrid approach for efficient provenance storage. ACM Transactions on Storage (TOS) 9(4):14Google Scholar
  68. 68.
    Xie Y, Muniswamy-Reddy KK, Long DD, Amer A, Feng D, Tan Z (2011) Compressing provenance graphs. In: Workshop on the theory and practice of provenance (TaPP’11). UsenixGoogle Scholar
  69. 69.
    Zhu X, Chen W, Zheng W, Ma X (2016) Gemini: a computation centric distributed graph processing system. In: Symposium on operating systems design and implementation (OSDI). USENIXGoogle Scholar
  70. 70.
    Ziegeldorf JH, Morchon OG, Wehrle K (2014) Privacy in the Internet of Things: threats and challenges. Secur Commun Netw 7(12):2728–2742CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd. 2017

Authors and Affiliations

  • Thomas Pasquier
    • 1
    Email author
  • Jatinder Singh
    • 2
  • Julia Powles
    • 3
  • David Eyers
    • 4
  • Margo Seltzer
    • 1
  • Jean Bacon
    • 2
  1. 1.Center for Research on Computation and SocietyHarvard UniversityCambridgeUSA
  2. 2.Computer LaboratoryUniversity of CambridgeCambridgeUK
  3. 3.Computing and Information ScienceCornell TechNew YorkUSA
  4. 4.Department of Computer ScienceUniversity of OtagoDunedinNew Zealand

Personalised recommendations