Encyclopedia of Big Data Technologies

2019 Edition
| Editors: Sherif Sakr, Albert Y. Zomaya

Data Provenance for Big Data Security and Accountability

  • Thye Way Phua
  • Ryan K. L. KoEmail author
Reference work entry
DOI: https://doi.org/10.1007/978-3-319-77525-8_237



Provenance is the derivative history of data (Ko et al. 2015; Ko and Will 2014). While provenance does not directly contribute to upholding and enforcing the information security requirements (confidentiality, integrity, and availability) in the context of Big Data security, provenance and its sources (e.g., metadata, lineage, data activities (create, read, update, and delete)) strongly provide verification and historical evidence to support the analysis or forecasting needs for the purpose of data security. One example is to analyze provenance to understand and prevent outages better (Ko et al. 2012), so as to achieve better availability. Provenance also contributes strongly to data forensics, especially in the study of data activity patterns triggered by software or human processes (Ko et al. 2015) (e.g., ransomware). The lineage and metadata describing provenance also provide substantial evidence for transparency and data accountability (Ko 2014...

This is a preview of subscription content, log in to check access.


  1. Agrawal D, Bernstein P, Bertino E, Davidson S, Dayal U, Franklin M, Gehrke J, Haas L, Halevy A, Han J, Jagadish HV, Labrinidis A, Madden S, Papakonstantinou Y, Patel JM, Ramakrishnan R, Ross K, Shahabi C, Suciu D, Vaithyanathan S, Widom J (2012) Challenges and opportunities with big data: a white paper prepared for the computing community consortium committee of the Computing Research Association. Technical report. https://cra.org/ccc/wp-content/uploads/sites/2/2015/05/bigdatawhitepaper.pdfGoogle Scholar
  2. Fu X, Gao Y, Luo B, Du X, Guizani M (2017) Security threats to hadoop: data leakage attacks and investigation. IEEE Netw 31(2):67–71CrossRefGoogle Scholar
  3. Ko RKL (2014) Data accountability in cloud systems. In: Nepal S, Pathan M (eds) Security, privacy and trust in cloud systems. Springer, Berlin. pp 211–238CrossRefGoogle Scholar
  4. Ko RKL, Phua TW (2017) The full provenance stack: five layers for complete and meaningful provenance. In: Proceedings of the security, privacy and anonymity in computation, communication and storage: SpaCCS 2017 international workshops, UbiSafe, ISSR, TrustData, TSP, SPIoT, NOPE, DependSys, SCS, WCSSC, MSCF and SPBD, 12–15 Dec 2017. Springer, GuangzhouCrossRefGoogle Scholar
  5. Ko RKL, Will MA (2014) Progger: an efficient, tamper-evident kernel-space logger for cloud data provenance tracking. In: Proceedings of the IEEE international conference on cloud computing, CLOUD ’14. IEEE Computer Society, Washington, DC, pp 881–889.  https://doi.org/10.1109/CLOUD.2014.121Google Scholar
  6. Ko RKL, Jagadpramana P, Lee BS (2011) Flogger: a file-centric logger for monitoring file access and transfers within cloud computing environments. In: Proceedings of the IEEE 10th international conference on trust, security and privacy in computing and communications, TRUSTCOM ’11. IEEE Computer Society, Washington, DC, pp 765–771.  https://doi.org/10.1109/TrustCom.2011.100Google Scholar
  7. Ko RKL, Lee SSG, Rajan V (2012) Understanding cloud failures. IEEE Spectr 49(12):84–84.  https://doi.org/10.1109/MSPEC.2012.6361788CrossRefGoogle Scholar
  8. Ko RKL, Russello G, Nelson R, Pang S, Cheang A, Dobbie G, Sarrafzadeh A, Chaisiri S, Asghar MR, Holmes G (2015) Stratus: towards returning data control to cloud users. In: International conference on algorithms and architectures for parallel processing. Springer, pp 57–70Google Scholar
  9. Muniswamy-Reddy KK, Holland DA, Braun U, Seltzer MI (2006) Provenance-aware storage systems. In: USENIX annual technical conference, general track. pp 43–56Google Scholar
  10. Xie Y, Muniswamy-Reddy KK, Feng D, Li Y, Long DDE (2013) Evaluation of a hybrid approach for efficient provenance storage. Trans Storage 9(4):14:1–14:29. https://doi.org/10.1145/2501986CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Cyber Security Lab – Department of Computer ScienceUniversity of WaikatoHamiltonNew Zealand