Data Provenance for Big Data Security and Accountability
Provenance is the derivative history of data (Ko et al. 2015; Ko and Will 2014). While provenance does not directly contribute to upholding and enforcing the information security requirements (confidentiality, integrity, and availability) in the context of Big Data security, provenance and its sources (e.g., metadata, lineage, data activities (create, read, update, and delete)) strongly provide verification and historical evidence to support the analysis or forecasting needs for the purpose of data security. One example is to analyze provenance to understand and prevent outages better (Ko et al. 2012), so as to achieve better availability. Provenance also contributes strongly to data forensics, especially in the study of data activity patterns triggered by software or human processes (Ko et al. 2015) (e.g., ransomware). The lineage and metadata describing provenance also provide substantial evidence for transparency and data accountability (Ko 2014...
- Agrawal D, Bernstein P, Bertino E, Davidson S, Dayal U, Franklin M, Gehrke J, Haas L, Halevy A, Han J, Jagadish HV, Labrinidis A, Madden S, Papakonstantinou Y, Patel JM, Ramakrishnan R, Ross K, Shahabi C, Suciu D, Vaithyanathan S, Widom J (2012) Challenges and opportunities with big data: a white paper prepared for the computing community consortium committee of the Computing Research Association. Technical report. https://cra.org/ccc/wp-content/uploads/sites/2/2015/05/bigdatawhitepaper.pdfGoogle Scholar
- Ko RKL, Phua TW (2017) The full provenance stack: five layers for complete and meaningful provenance. In: Proceedings of the security, privacy and anonymity in computation, communication and storage: SpaCCS 2017 international workshops, UbiSafe, ISSR, TrustData, TSP, SPIoT, NOPE, DependSys, SCS, WCSSC, MSCF and SPBD, 12–15 Dec 2017. Springer, GuangzhouCrossRefGoogle Scholar
- Ko RKL, Will MA (2014) Progger: an efficient, tamper-evident kernel-space logger for cloud data provenance tracking. In: Proceedings of the IEEE international conference on cloud computing, CLOUD ’14. IEEE Computer Society, Washington, DC, pp 881–889. https://doi.org/10.1109/CLOUD.2014.121Google Scholar
- Ko RKL, Jagadpramana P, Lee BS (2011) Flogger: a file-centric logger for monitoring file access and transfers within cloud computing environments. In: Proceedings of the IEEE 10th international conference on trust, security and privacy in computing and communications, TRUSTCOM ’11. IEEE Computer Society, Washington, DC, pp 765–771. https://doi.org/10.1109/TrustCom.2011.100Google Scholar
- Ko RKL, Russello G, Nelson R, Pang S, Cheang A, Dobbie G, Sarrafzadeh A, Chaisiri S, Asghar MR, Holmes G (2015) Stratus: towards returning data control to cloud users. In: International conference on algorithms and architectures for parallel processing. Springer, pp 57–70Google Scholar
- Muniswamy-Reddy KK, Holland DA, Braun U, Seltzer MI (2006) Provenance-aware storage systems. In: USENIX annual technical conference, general track. pp 43–56Google Scholar