Abstract
This paper focuses on using metadata to enforce the right to be forgotten in large-scale data lakes. With the rise of cloud storage services for massive data storage, ensuring compliance with data protection regulations like GDPR has become challenging. Implementing the right to be forgotten in cloud-based data lakes is complex due to different storage systems and immutability properties. Existing solutions lack user specific information, emphasizing the need for a practical approach. This paper presents a novel solution that leverages metadata to address these challenges. Our solution is faster, supports various file types, and generates user-specific PII reports for deletion or anonymization. Evaluation against existing tools demonstrates its effectiveness. By leveraging metadata, our solution ensures compliance with data protection laws and overcomes the challenges of diverse storage systems in cloud-based data lakes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Oracle, Sakila Database (2022). https://dev.mysql.com/doc/sakila/en/
Intersoft Consulting Services AG, Right to be Forgotten (2018). https://gdpr-info.eu/issues/right-to-be-forgotten
Amazon Web Services, AWS S3 (2022). https://aws.amazon.com/s3/
Microsoft, Azure (2022). https://azure.microsoft.com/en-us/services/storage/blobs/
Google, Data Lake (2022). https://cloud.google.com/learn/what-is-a-data-lake
Google, Google Cloud (2022). https://cloud.google.com/storage
Intersoft Consulting Services AG,GDPR (2018). https://gdpr-info.eu/
Amazon Web Services, AWS Macie (2022). https://aws.amazon.com/macie/
Open Source, AWS S3 Find and Forget (2022). https://github.com/awslabs/amazon-s3-find-and-forget
James Dixon, Pentaho (2010). https://jamesdixon.wordpress.com/2010/10/14/pentaho-hadoop-and-data-lakes/
Sawadogo, P., Darmont, J.: On data lake architectures and metadata management. J. Intell. Inf. Syst. 56(1), 97–120 (2020). https://doi.org/10.1007/s10844-020-00608-7
Tovernić, S., et al.: Solution for detecting sensitive data inside a data lake. In: 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 1284–1288. IEEE (2018)
Vuk, K.: Compliance of data lake enterprise architecture model with the general data protection regulation (GDPR) (2015)
Casino, F., Politou, E., Alepis, E., Patsakis, C.: Immutability and decentralized storage: an analysis of emerging threats. IEEE Access 8, 4737–4744 (2019)
Ju, J., Wu, J., Fu, J., Lin, Z., Zhang, J.: A survey on cloud storage. J. Comput. 6, 1764–1771 (2011)
Athanassoulis, M., Sarkar, S., Zhu, Z., Staratzis, D.: Building deletion-compliant data systems. IEEE Data Eng. Bull. (2022)
Maguire, S., Friedberg, J., Nguyen, M.-H.C., Haynes, P.: A metadata-based architecture for user-centered data accountability. Electron. Mark. 25(2), 155–160 (2015). https://doi.org/10.1007/s12525-015-0184-z
Simson, G.: Architects of the Information Society: 35 Years of the Laboratory for Computer Science at MIT, MIT Press, Cambridge (1999)
Spalević, Ž: Vićentijević, Kosana: GDPR and challenges of personal data protection. Eur. J. Appl. Econ. 19, 55–65 (2022)
Politou, E., Michota, A., Alepis, E., Pocs, M., Patsakis, C.: Backups and the right to be forgotten in the GDPR: An uneasy relationship. Comput. Law Secur. Rev. 34, 1247–1257 (2018)
Said, D., Eda, M.: he GDPR compliance and access control systems: challenges and research opportunities. In: ICISSP, pp. 571–578 (2022)
Villaronga, E.F., Kieseberg, P., Li, T.: Humans forget, machines remember: artificial intelligence and the right to be forgotten. Comput. Law Secur. Rev. 34, 304–313 (2018)
TPC, TPC-H Database (1998). https://www.tpc.org/tpch/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Conflict of Interests
Ingo Klose is an employee of b.telligent company. All other authors have no conflicts of interest.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bhardwaj, P., Darrab, S., Broneske, D., Klose, I., Saake, G. (2024). Enforcing Right to Be Forgotten in Cloud-Based Data Lakes. In: Arai, K. (eds) Advances in Information and Communication. FICC 2024. Lecture Notes in Networks and Systems, vol 920. Springer, Cham. https://doi.org/10.1007/978-3-031-53963-3_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-53963-3_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53962-6
Online ISBN: 978-3-031-53963-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)