Skip to main content

Enforcing Right to Be Forgotten in Cloud-Based Data Lakes

  • Conference paper
  • First Online:
Advances in Information and Communication (FICC 2024)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 920))

Included in the following conference series:

  • 182 Accesses

Abstract

This paper focuses on using metadata to enforce the right to be forgotten in large-scale data lakes. With the rise of cloud storage services for massive data storage, ensuring compliance with data protection regulations like GDPR has become challenging. Implementing the right to be forgotten in cloud-based data lakes is complex due to different storage systems and immutability properties. Existing solutions lack user specific information, emphasizing the need for a practical approach. This paper presents a novel solution that leverages metadata to address these challenges. Our solution is faster, supports various file types, and generates user-specific PII reports for deletion or anonymization. Evaluation against existing tools demonstrates its effectiveness. By leveraging metadata, our solution ensures compliance with data protection laws and overcomes the challenges of diverse storage systems in cloud-based data lakes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/BlackCurrantDS/RTFCDL.

References

  1. Oracle, Sakila Database (2022). https://dev.mysql.com/doc/sakila/en/

  2. Intersoft Consulting Services AG, Right to be Forgotten (2018). https://gdpr-info.eu/issues/right-to-be-forgotten

  3. Amazon Web Services, AWS S3 (2022). https://aws.amazon.com/s3/

  4. Microsoft, Azure (2022). https://azure.microsoft.com/en-us/services/storage/blobs/

  5. Google, Data Lake (2022). https://cloud.google.com/learn/what-is-a-data-lake

  6. Google, Google Cloud (2022). https://cloud.google.com/storage

  7. Intersoft Consulting Services AG,GDPR (2018). https://gdpr-info.eu/

  8. Amazon Web Services, AWS Macie (2022). https://aws.amazon.com/macie/

  9. Open Source, AWS S3 Find and Forget (2022). https://github.com/awslabs/amazon-s3-find-and-forget

  10. James Dixon, Pentaho (2010). https://jamesdixon.wordpress.com/2010/10/14/pentaho-hadoop-and-data-lakes/

  11. Sawadogo, P., Darmont, J.: On data lake architectures and metadata management. J. Intell. Inf. Syst. 56(1), 97–120 (2020). https://doi.org/10.1007/s10844-020-00608-7

    Article  Google Scholar 

  12. Tovernić, S., et al.: Solution for detecting sensitive data inside a data lake. In: 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 1284–1288. IEEE (2018)

    Google Scholar 

  13. Vuk, K.: Compliance of data lake enterprise architecture model with the general data protection regulation (GDPR) (2015)

    Google Scholar 

  14. Casino, F., Politou, E., Alepis, E., Patsakis, C.: Immutability and decentralized storage: an analysis of emerging threats. IEEE Access 8, 4737–4744 (2019)

    Article  Google Scholar 

  15. Ju, J., Wu, J., Fu, J., Lin, Z., Zhang, J.: A survey on cloud storage. J. Comput. 6, 1764–1771 (2011)

    Article  Google Scholar 

  16. Athanassoulis, M., Sarkar, S., Zhu, Z., Staratzis, D.: Building deletion-compliant data systems. IEEE Data Eng. Bull. (2022)

    Google Scholar 

  17. Maguire, S., Friedberg, J., Nguyen, M.-H.C., Haynes, P.: A metadata-based architecture for user-centered data accountability. Electron. Mark. 25(2), 155–160 (2015). https://doi.org/10.1007/s12525-015-0184-z

    Article  Google Scholar 

  18. Simson, G.: Architects of the Information Society: 35 Years of the Laboratory for Computer Science at MIT, MIT Press, Cambridge (1999)

    Google Scholar 

  19. Spalević, Ž: Vićentijević, Kosana: GDPR and challenges of personal data protection. Eur. J. Appl. Econ. 19, 55–65 (2022)

    Article  Google Scholar 

  20. Politou, E., Michota, A., Alepis, E., Pocs, M., Patsakis, C.: Backups and the right to be forgotten in the GDPR: An uneasy relationship. Comput. Law Secur. Rev. 34, 1247–1257 (2018)

    Article  Google Scholar 

  21. Said, D., Eda, M.: he GDPR compliance and access control systems: challenges and research opportunities. In: ICISSP, pp. 571–578 (2022)

    Google Scholar 

  22. Villaronga, E.F., Kieseberg, P., Li, T.: Humans forget, machines remember: artificial intelligence and the right to be forgotten. Comput. Law Secur. Rev. 34, 304–313 (2018)

    Article  Google Scholar 

  23. TPC, TPC-H Database (1998). https://www.tpc.org/tpch/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sadeq Darrab .

Editor information

Editors and Affiliations

Ethics declarations

Conflict of Interests

Ingo Klose is an employee of b.telligent company. All other authors have no conflicts of interest.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bhardwaj, P., Darrab, S., Broneske, D., Klose, I., Saake, G. (2024). Enforcing Right to Be Forgotten in Cloud-Based Data Lakes. In: Arai, K. (eds) Advances in Information and Communication. FICC 2024. Lecture Notes in Networks and Systems, vol 920. Springer, Cham. https://doi.org/10.1007/978-3-031-53963-3_16

Download citation

Publish with us

Policies and ethics