Data provenance to audit compliance with privacy policy in the Internet of Things


Managing privacy in the IoT presents a significant challenge. We make the case that information obtained by auditing the flows of data can assist in demonstrating that the systems handling personal data satisfy regulatory and user requirements. Thus, components handling personal data should be audited to demonstrate that their actions comply with all such policies and requirements. A valuable side-effect of this approach is that such an auditing process will highlight areas where technical enforcement has been incompletely or incorrectly specified. There is a clear role for technical assistance in aligning privacy policy enforcement mechanisms with data protection regulations. The first step necessary in producing technology to accomplish this alignment is to gather evidence of data flows. We describe our work producing, representing and querying audit data and discuss outstanding challenges.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3


  1. 1.,

  2. 2., as stated in Section 2 building on standards is of fundamental importance for the interoperability of IoT systems.

  3. 3.

    PBAC uses provenance information to make primary data access decisions, while PAC controls access to the provenance data itself.

  4. 4.

    Apache Accumulo is a scalable open-source key/value store implementation based on the design of Google’s BigTable.

  5. 5.

  6. 6.

    ISO/IEC 20922:2016

  7. 7.

  8. 8.

    Code Civil Article 1316-1.

  9. 9.

  10. 10.


  1. 1.

    Overview of the Internet of Things. Tech. Rep. (2012) Y.2060 ITU telecommunication standardization sector

  2. 2.

    Akoush S, Sohan R, Hopper A (2013) HadoopProv: towards Provenance as a First Class Citizen in MapReduce. In: Workshop on the theory and practice of provenance (TaPP’13). USENIX

  3. 3.

    Amir-Mohammadian S, Chong S, Skalka C (2016) Correct audit logging: theory and practice. In: International conference on principles of security and trust (POST’16). Springer

  4. 4.

    Armbrust M, Das T, Davidson A, Ghodsi A, Or A, Rosen J, Stoica I, Wendell P, Xin R, Zaharia M (2015) Scaling Spark in the real world: performance and usability. International Conference on Very Large Data Bases (VLDB) 8(12):1840–1843

    Google Scholar 

  5. 5.

    Bacon J, Moody K (2002) Toward open, secure, widely distributed services. Commun ACM 45(6):59–64

    Article  Google Scholar 

  6. 6.

    Bacon J, Singh J, Trossen D, Pavel D, Vastardis N, Yang AB, Pennington K, Clarke S, Jones SG (2012) Personal and social communication services for health and lifestyle monitoring. In: Proceedings 1st international conference on global health challenges (Global Health 2012), with IARIA Datasys, Venice, p 2012

  7. 7.

    Barbieri DF, Braga D, Ceri S, VALLE ED, Grossniklaus M (2010) C-SPARQL: a continuous query language for RDF data streams. Int J Semantic Comput 4(01):3–25

    Article  MATH  Google Scholar 

  8. 8.

    Bates A, Butler K, Moyer T (2015) Take only what you need: leveraging mandatory access control policy to reduce provenance storage costs. In: Workshop on theory and practice of provenance. USENIX, pp 7–7

  9. 9.

    Bates A, Mood B, Valafar M, Butler K (2013) Towards secure provenance-based access control in cloud environments. In: Conference on data and application security and privacy. ACM, pp 277–284

  10. 10.

    Bates A, Tian D, Butler K, Moyer T (2015) Trustworthy whole-system provenance for the Linux kernel. In: Security symposium. USENIX

  11. 11.

    Braun U, Garfinkel S, Holland DA, Muniswamy-Reddy KK, Seltzer MI (2006) Issues in automatic provenance collection. In: Provenance and annotation of data. Springer, pp 171–183

  12. 12.

    Braun U, Shinnar A, Seltzer MI (2008) Securing provenance. In: Summit on hot topics in security (HotSec’08). USENIX

  13. 13.

    Cadenhead T, Khadilkar V, Kantarcioglu M, Thuraisingham B (2011) A language for provenance access control. In: Conference on data and application security and privacy. ACM, pp 133–144

  14. 14.

    Carata L, Akoush S, Balakrishnan N, Bytheway T, Sohan R, Selter M, Hopper A (2014) A primer on provenance. Commun ACM 57(5):52–60

    Article  Google Scholar 

  15. 15.

    Chaudhry A, Crowcroft J, Howard H, Madhavapeddy A, Mortier R, Haddadi H, McAuley D (2015) Personal data: thinking inside the box. In: Proceedings of the fifth decennial Aarhus conference on critical alternatives. Aarhus University Press, pp 29–32

  16. 16.

    Chen P, Evans T, Plale B (2016) Analysis of memory constrained live provenance. In: International provenance and annotation workshop. Springer, pp 42–54

  17. 17.

    Coker G, Guttman J, Loscocco P, Herzog A, Millen J, O’Hanlon B, Ramsdell J, Segall A, Sheehy J, Sniffen B (2011) Principles of remote attestation. Int J Inf Secur 10(2):63–81

    Article  Google Scholar 

  18. 18.

    Crawl D, Wang J, Altintas I (2011) Provenance for mapreduce-based data-intensive workflows. In: Workshop on workflows in support of large-scale science. ACM, pp 21–30

  19. 19.

    Curbera F, Doganata Y, Martens A, Mukhi NK, Slominski A (2008) Business provenance–a technology to increase traceability of end-to-end operations. In: On the move to meaningful internet systems: OTM 2008. Springer, pp 100–119

  20. 20.

    Edwards A, Jaeger T, Zhang X (2002) Runtime verification of authorization hook placement for the Linux security modules framework. In: Conference on computer and communications security (CCS). ACM, pp 225–234

  21. 21.

    Flittner M, Balaban S, Bless R (2016) Cloudinspector: A transparency-as-a-service solution for legal issues in cloud computing. In: IC2E international workshop on legal and technical issues in cloud computing (CLaw’16). IEEE

  22. 22.

    Fu Q, Zhu J, Hu W, Lou JG, Ding R, Lin Q, Zhang D, Xie T (2014) Where do developers log? an empirical study on logging practices in industry. In: International conference on software engineering (ICSE). ACM, pp 24–33

  23. 23.

    Ganapathy V, Jaeger T, Jha S (2005) Automatic placement of authorization hooks in the Linux security modules framework. In: Conference on computer and communications security (CCS). ACM, pp 330–339

  24. 24.

    Gehani A, Tariq D (2012) Spade: Support for provenance auditing in distributed environments. In: Middleware conference. IEEE/ACM/IFP/USENIX, pp 101–120

  25. 25.

    Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) Powergraph: distributed graph-parallel computation on natural graphs. In: Symposium on operating systems design and implementation (OSDI’12). USENIX, p 2

  26. 26.

    Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) Graphx: graph processing in a distributed dataflow framework. In: Symposium on operating systems design and implementation (OSDI’14), vol 14, pp 599–613

  27. 27.

    Gubbi J, Buyya R, Marusic S, Palaniswami M (2013) Internet of Things (IoT): A vision, architectural elements, and future directions. Futur Gener Comput Syst 29(7):1645–1660

    Article  Google Scholar 

  28. 28.

    Hayton RJ, Bacon JM, Moody K (1998) Access control in an open distributed environment. In: 1998 IEEE symposium on security and privacy, 1998. Proceedings. IEEE, pp 3–14

  29. 29.

    Hon WK, Millard C, Singh J (2016) Twenty legal considerations for Clouds of Things. Queen Mary School of Law Legal Studies Research Paper (216)

  30. 30.

    Hussein J, Moreau L, Sassone V (2015) Obscuring provenance confidential information via graph transformation. In: IFIP International conference on trust management. Springer, pp 109–125

  31. 31.

    Interlandi M, Shah K, Tetali SD, Gulzar MA, Yoo S, Kim M, Millstein T, Condie T (2015) Titian: data provenance support in Spark. Conference on Very Large Databases (VLDB’15) 9(3): 216–227

    Google Scholar 

  32. 32.

    Jaeger T, Edwards A, Zhang X (2004) Consistency analysis of authorization hook placement in the Linux security modules framework. ACM Trans Inf Syst Secur (TISSEC) 7(2):175–205

    Article  Google Scholar 

  33. 33.

    Jaeger T, Sailer R, Shankar U (2006) PRIMA: Policy-reduced integrity measurement architecture. In: ACM Symposium on access control models and technologies (SACMAT). ACM, pp 19–28

  34. 34.

    Johnson A, Waye L, Moore S, Chong S (2015) Exploring and enforcing security guarantees via program dependence graphs. In: ACM SIGPLAN notices, vol 50. ACM, pp 291–302

  35. 35.

    Kemmerer RA, Vigna G (2002) Intrusion detection: a brief history and overview. IEEE Computer 35 (4):27–30

    Article  Google Scholar 

  36. 36.

    Keoh SL, Kumar S, Tschofenig H (2014) Securing the internet of things: a standardization perspective. Internet of Things Journal 1(3):265–275

    Article  Google Scholar 

  37. 37.

    Kohnstamm J, Madhub D (2014) Mauritius Declaration on the Internet of Things. In: International conference of data protection and privacy commissioners

  38. 38.

    Kyrola A, Blelloch GE, Guestrin C, et al. (2012) GraphChi: large-scale graph computation on just a PC. In: Symposium on operating systems design and implementation (OSDI’12), vol 12. USENIX, pp 31–46

  39. 39.

    Lampson BW (2004) Computer security in the real world. IEEE Computer 37(6):37–46

    Article  Google Scholar 

  40. 40.

    Macko P, Seltzer M (2012) A general-purpose provenance library. In: Workshop on the theory and practice of provenance (TaPP’12). Usenix

  41. 41.

    McKinsey Global Institute (2015) The Internet of Things: mapping the value beyond the hype

  42. 42.

    Mineraud J, Mazhelis O, Su X, Tarkoma S (2016) A gap analysis of Internet-of-Things platforms. Comput Commun ACM 89(C): 5–16

    Article  Google Scholar 

  43. 43.

    Missier P, Belhajjame K, Cheney J (2013) The W3C PROV family of specifications for modelling provenance metadata. In: Conference on extending database technology (EDBT). ACM, pp 773–776

  44. 44.

    Missier P, Bryans J, Gamble C, Curcin V, Danger R (2014) Provabs: model, policy, and tooling for abstracting prov graphs. In: International provenance and annotation workshop. Springer, pp 3–15

  45. 45.

    Moyer T, Gadepally V (2016) High-throughput ingest of data provenance records into Accumulo. In: High performance extreme computing conference (HPEC). IEEE, pp 1–6

  46. 46.

    Neumann T, Weikum G (2010) The RDF-3x engine for scalable management of RDF data. VLDB J 19 (1):91–113

    Article  Google Scholar 

  47. 47.

    Ni Q, Xu S, Bertino E, Sandhu R, Han W (2009) An access control language for a general provenance model. In: Workshop on secure data management. Springer, pp 68–88

  48. 48.

    Park J, Nguyen D, Sandhu R (2012) A provenance-based access control model. In: Annual international conference on privacy, security and trust. IEEE, pp 137–144

  49. 49.

    Pasquier T (2017) Camflow/camflow-dev: v0.3.0. doi: 10.5281/zenodo.571427.

  50. 50.

    Pasquier T, Eyers D (2016) Information flow audit for transparency and compliance in the handling of personal data. In: IC2E international workshop on legal and technical issues in cloud computing (CLaw’16). IEEE

  51. 51.

    Pasquier T, Singh J, Bacon J, Eyers D (2016) Information Flow Audit for PaaS clouds. In: International conference on cloud engineering (IC2E). IEEE

  52. 52.

    Pasquier T, Singh J, Eyers D, Bacon J (2015) CamFlow: managed data-sharing for cloud services. IEEE Trans Cloud Comput (TCC)

  53. 53.

    Pohly DJ, McLaughlin S, McDaniel P, Butler K (2012) Hi-fi: collecting high-fidelity whole-system provenance. In: Annual computer security applications conference. ACM, pp 259–268

  54. 54.

    Povey D (1999) Optimistic security: a new access control paradigm. In: Proceedings of the 1999 workshop on new security paradigms. ACM, pp 40–45

  55. 55.

    Roy A, Mihailovic I, Zwaenepoel W (2013) X-stream: edge-centric graph processing using streaming partitions. In: Proceedings of the twenty-fourth ACM symposium on operating systems principles (SOSP). ACM, pp 472–488

  56. 56.

    Sailer R, Zhang X, Jaeger T, Van Doorn L (2004) Design and implementation of a TCG-based integrity measurement architecture. In: USENIX Security symposium, vol 13. USENIX, pp 223–238

  57. 57.

    Sakka MA, Defude B, Tellez J (2010) Document provenance in the cloud: constraints and challenges. In: Networked services and applications-engineering, control and management. Springer, pp 107–117

  58. 58.

    Singh J, Pasquier T, Bacon J, Ko H, Eyers D (2016) Twenty security considerations for cloud-supported Internet of Things. IEEE Internet of Things Journal 3(3):269–284

    Article  Google Scholar 

  59. 59.

    Singh J, Pasquier T, Bacon J, Powles J, Diaconu R, Eyers D (2016) Big ideas paper: policy-driven middleware for a legally compliant internet of things. In: ACM/IFIP/USENIX middleware. ACM

  60. 60.

    Smith M, Szongott C, Henne B, von Voigt G (2012) Big data privacy issues in public social media. In: 2012 6th IEEE international conference on digital ecosystems and technologies (DEST). IEEE, pp 1–6

  61. 61.

    Stolfo SJ, Salem MB, Keromytis AD (2012) Fog computing: mitigating insider data theft attacks in the cloud. In: 2012 IEEE symposium on security and privacy workshops (SPW). IEEE, pp 125–128

  62. 62.

    Takabi H, Joshi J, Ahn G (2010) Security and privacy challenges in cloud computing environments. IEEE Secur Priv 8(6):54–57

  63. 63.

    Vaughan JA, Chong S (2011) Inference of expressive declassification policies. In: 2011 IEEE Symposium on security and privacy. IEEE, pp 180–195

  64. 64.

    Weber RH (2010) Internet of Things–new security and privacy challenges. Computer Law & Security Review 26(1):23–30

    Article  Google Scholar 

  65. 65.

    Weitzner DJ (2007) Beyond secrecy: new privacy protection strategies for open information spaces. IEEE Internet Comput 11(5):96–95

    Article  Google Scholar 

  66. 66.

    Weitzner DJ, Abelson H, Berners-Lee T, Feigenbaum J, Hendler J, Sussman GJ (2008) Information accountability. Commun ACM 51(6):82–87

    Article  Google Scholar 

  67. 67.

    Xie Y, Muniswamy-Reddy KK, Feng D, Li Y, Long DD (2013) Evaluation of a hybrid approach for efficient provenance storage. ACM Transactions on Storage (TOS) 9(4):14

    Google Scholar 

  68. 68.

    Xie Y, Muniswamy-Reddy KK, Long DD, Amer A, Feng D, Tan Z (2011) Compressing provenance graphs. In: Workshop on the theory and practice of provenance (TaPP’11). Usenix

  69. 69.

    Zhu X, Chen W, Zheng W, Ma X (2016) Gemini: a computation centric distributed graph processing system. In: Symposium on operating systems design and implementation (OSDI). USENIX

  70. 70.

    Ziegeldorf JH, Morchon OG, Wehrle K (2014) Privacy in the Internet of Things: threats and challenges. Secur Commun Netw 7(12):2728–2742

    Article  Google Scholar 

Download references


This work was supported by the US National Science Foundation under grant SSI-1450277 End-to-End Provenance, and the UK Engineering and Physical Sciences Research Council grant EP/K011510 CloudSafetyNet: End-to-end application security in the cloud. Cambridge authors acknowledge the support of Microsoft through the Microsoft Cloud Computing Research Centre.

Author information



Corresponding author

Correspondence to Thomas Pasquier.

Additional information

The software artefacts corresponding to our work to date are available online under a GNU General Public Licence at

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Pasquier, T., Singh, J., Powles, J. et al. Data provenance to audit compliance with privacy policy in the Internet of Things. Pers Ubiquit Comput 22, 333–344 (2018).

Download citation


  • Data Provenance
  • Enforcement Techniques
  • Provenance Graph
  • Audit Graph
  • Data Protection Law