RDF Graph Anonymization Robust to Data Linkage
Abstract
Privacy is a major concern when publishing new datasets in the context of Linked Open Data (LOD). A new dataset published in the LOD is indeed exposed to privacy breaches due to the linkage to objects already present in the other datasets of the LOD. In this paper, we focus on the problem of building safe anonymizations of an RDF graph to guarantee that linking the anonymized graph with any external RDF graph will not cause privacy breaches. Given a set of privacy queries as input, we study the data-independent safety problem and the sequence of anonymization operations necessary to enforce it. We provide sufficient conditions under which an anonymization instance is safe given a set of privacy queries. Additionally, we show that our algorithms for RDF data anonymization are robust in the presence of sameAs links that can be explicit or inferred by additional knowledge.
Keywords
Linked Open Data Data privacy RDF anonymizationNotes
Acknowledgements
This work has been supported by the Auvergne-Rhône-Alpes region through the ARC6 research program funding Remy Delanaux’s PhD; by the LabEx PERSYVAL-Lab (ANR-11-LABX-0025-01); by the SIDES 3.0 project (ANR-16-DUNE-0002) funded by the French Programme Investissement d’Avenir (PIA); and by the Palse Impulsion 2016/31 program (ANR-11-IDEX-0007-02) at UDL.
References
- 1.Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Boston (1995)zbMATHGoogle Scholar
- 2.Bonifati, A., Martens, W., Timm, T.: An analytical study of large SPARQL query logs. PVLDB 11(2), 149–161 (2017)Google Scholar
- 3.Buron, M., Goasdoué, F., Manolescu, I., Mugnier, M.L.: Reformulation-based query answering for RDF graphs with RDF ontologies. In: ESWC (2019, to appear)CrossRefGoogle Scholar
- 4.Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning and efficient query answering in description logics: the DL-Lite family. J. Autom. Reasoning 39(3), 385–429 (2007)MathSciNetCrossRefGoogle Scholar
- 5.Cheng, J., Fu, A.W., Liu, J.: K-isomorphism: privacy preserving network publication against structural attacks. In: SIGMOD, pp. 459–470. ACM (2010)Google Scholar
- 6.Delanaux, R., Bonifati, A., Rousset, M.C., Thion, R.: Query-based linked data anonymization. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11136, pp. 530–546. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00671-6_31CrossRefGoogle Scholar
- 7.Deutsch, A., Papakonstantinou, Y.: Privacy in database publishing. In: ICDT, pp. 230–245 (2005)CrossRefGoogle Scholar
- 8.Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). https://doi.org/10.1007/11787006_1CrossRefGoogle Scholar
- 9.Fung, B.C.M., Wang, K., Chen, R., Yu, P.S.: Privacy-preserving data publishing: a survey of recent developments. ACM Comput. Surv. 42(4), 14:1–14:53 (2010)CrossRefGoogle Scholar
- 10.Goasdoué, F., Manolescu, I., Roatis, A.: Efficient query answering against dynamic RDF databases. In: EDBT, pp. 299–310 (2013)Google Scholar
- 11.Grau, B.C., Horrocks, I.: Privacy-preserving query answering in logic-based information systems. In: ECAI, pp. 40–44 (2008)Google Scholar
- 12.Grau, B.C., Kostylev, E.V.: Logical foundations of privacy-preserving publishing of linked data. In: AAAI, pp. 943–949. AAAI Press (2016)Google Scholar
- 13.Gutiérrez, C., Hurtado, C.A., Mendelzon, A.O.: Foundations of semantic Web databases. In: PODS, pp. 95–106. ACM (2004)Google Scholar
- 14.Hansen, P., Lutz, C., Seylan, I., Wolter, F.: Efficient query rewriting in the description logic EL and beyond. In: IJCAI, pp. 3034–3040. AAAI Press (2015)Google Scholar
- 15.Heitmann, B., Hermsen, F., Decker, S.: k-RDF-neighbourhood anonymity: combining structural and attribute-based anonymisation for linked data. In: PrivOn@ISWC, vol. 1951 (2017). CEUR-WS.org
- 16.Li, N., Li, T., Venkatasubramanian, S.: t-Closeness: privacy beyond k-anonymity and l-diversity. In: ICDE, pp. 106–115. IEEE Computer Society (2007)Google Scholar
- 17.Machanavajjhala, A., He, X., Hay, M.: Differential privacy in the wild: a tutorial on current practices & open challenges. PVLDB 9(13), 1611–1614 (2016)Google Scholar
- 18.Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: privacy beyond k-anonymity. TKDD 1(1), 3 (2007)CrossRefGoogle Scholar
- 19.Malyshev, S., Krötzsch, M., González, L., Gonsior, J., Bielefeldt, A.: Getting the most out of Wikidata: semantic technology usage in Wikipedia’s knowledge graph. In: ISWC, pp. 376–394 (2018)Google Scholar
- 20.Miklau, G., Suciu, D.: A formal analysis of information disclosure in data exchange. J. Comput. Syst. Sci. 73(3), 507–534 (2007)MathSciNetCrossRefGoogle Scholar
- 21.Nobari, S., Karras, P., Pang, H., Bressan, S.: L-opacity: linkage-aware graph anonymization. In: EDBT, pp. 583–594 (2014). OpenProceedings.org
- 22.Radulovic, F., García-Castro, R., Gómez-Pérez, A.: Towards the anonymisation of RDF data. In: SEKE, pp. 646–651. KSI Research Inc. (2015)Google Scholar
- 23.Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)MathSciNetCrossRefGoogle Scholar
- 24.Vatsalan, D., Sehili, Z., Christen, P., Rahm, E.: Privacy-preserving record linkage for big data: current approaches and research challenges. In: Zomaya, A.Y., Sakr, S. (eds.) Handbook of Big Data Technologies, pp. 851–895. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-49340-4_25CrossRefGoogle Scholar