Abstract
This paper proposes the first robust watermarking method of outsourced or shared genomic data in the context of genome-wide association studies (GWAS) with the primary purpose of identifying the individual or entity at the origin of an illegal information redistribution or disclosure. Our scheme’s first unique feature is that it employs a database watermarking strategy to take advantage of the fact that GWAS data are stored in variant call format (VCF) files, which have a database-like structure. Second, it proposes a quantization index modulation based on watermarking modulation for GWAS data under the constraint of not interfering with identifying candidate variants or genes involved in the pathology. We evaluate the theoretical performance of our method in terms of watermarking insertion capacity, distortion, and robustness against different attacks. Experimental results conducted on real data and the weighted-sum statistic (WSS) GWAS study demonstrate the efficiency of the proposed scheme and that it can be used for identifying the cloud service providers (geneticists) at the origin of an information disclosure even if the genotype data has been modified.
Keywords
- Information security
- Genome-wide association studies (GWAS)
- Traceability
- Watermarking
- Genomic data
R. Bellafqira and M. Al-Ghadi—Equal contributions.
This work was supported in part by the French Government support granted to the Labex CominLabs and managed by the ANR through the “Investing for the Future” Program under Grant ANR-10-LABX-07-01 through the project TADOP.
This is a preview of subscription content, access via your institution.
Buying options



References
Mehrgou, A., Akouchekian, M.: The importance of BRCA1 and BRCA2 genes mutations in breast cancer development. Med. J. Islamic Repub. Iran (MJIRI) 30(369), 1–12 (2016)
Ginsburg, G.: Medical genomics: gather and use genetic data in health care. Nat. News 508(7497), 451–453 (2014)
Wang, M.H., Cordell, H.J., Van Steen, K.:Statistical methods for genome-wide association studies. In: Seminars in Cancer Biology, vol. 55, pp. 53–60. Elsevier (2019)
Taleb, A., Kirchler, M., Monti, R., Lippert, C.: ContiG: self-supervised multimodal contrastive learning for medical imaging with genetics. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 20 908–20 921. IEEE (2022)
Michael, B.E., Yann, L.G., Sarah, E.J., Napolioni, V., Michael, G.D., Zihuai, H.: A fast and robust strategy to remove variant-level artifacts in alzheimer disease sequencing project data. Neurol. Genet. 8(5), e200012 (2022)
Shin, J., et al.: PhenGenVar: a user-friendly genetic variant detection and visualization tool for precision medicine. J. Personalized Med. 12(6), 1–11 (2022)
Ozaki, K., et al.: Functional SNPs in the lymphotoxin-\(\alpha \) gene that are associated with susceptibility to myocardial infarction. Nat. Genet. 32(4), 650–654 (2002)
Madsen, B.E., Browning, S.R.: A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 5(2), 1–11 (2009)
Ding, H., Tian, Y., Peng, C., Zhang, Y., Xiang, S.: Inference attacks on genomic privacy with an improved HMM and an RCNN model for unrelated individuals. Inf. Sci. 512, 207–218 (2020)
Bellafqira, R., Coatrieux, G., Genin, E., Cozic, M.: Secure multilayer perceptron based on homomorphic encryption. In: Yoo, C.D., Shi, Y.-Q., Kim, H.J., Piva, A., Kim, G. (eds.) IWDW 2018. LNCS, vol. 11378, pp. 322–336. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11389-6_24
Rady, M., Abdelkader, T., Ismail, R.: Integrity and confidentiality in cloud outsourced data. Ain Shams Eng. J. 10, 275–285 (2019)
Wang, X., Jiang, X., Vaidya, J.: Efficient verification for outsourced genome-wide association studies. J. Biomed. Inform. 117, 103714 (2021)
Wang, J., Du, X., Lu, J., Lu, W.: Bucket-based authentication for outsourced databases. Concurrency Comput. Pract. Experience 22(9), 1160–1180 (2010)
Niyitegeka, David, Coatrieux, Gouenou, Bellafqira, Reda, Genin, Emmanuelle, Franco-Contreras, Javier: Dynamic watermarking-based integrity protection of homomorphically encrypted databases – application to outsourced genetic data. In: Yoo, Chang D.., Shi, Yun-Qing., Kim, Hyoung Joong, Piva, Alessandro, Kim, Gwangsu (eds.) IWDW 2018. LNCS, vol. 11378, pp. 151–166. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11389-6_12
Boujdad, F.-Z., Niyitegeka, D., Bellafqira, R., Coatrieux, G., Génin, E., Südholt, M.S.: A hybrid cloud deployment architecture for privacy-preserving collaborative genome-wide association studies. In: Gladyshev, P., Goel, S., James, J., Markowsky, G., Johnson, D. (eds.) ICDFC 2021. LNICST, vol. 441, pp. 342–359. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-06365-7_21
Chen, W.: An artificial chromosome for data storage. Nat. Sci. Rev. 8(5), nwab028 (2021)
Nguyen, T.T., Cai, K., Song, W., Immink, K.A.S.: Optimal single chromosome-inversion correcting codes for data storage in live DNA. In: IEEE International Symposium on Information Theory (ISIT), pp. 1791–1796. IEEE (2022)
Vinodhini, R., Malathi, P.: Hiding information in the DNA sequence using DNA steganographic algorithms with double-layered security. Int. J. Inf. Secur. Priv. (IJISP) 16(1), 1–20 (2022)
Wang, Y., Han, Q., Cui, G., Sun, J.: Hiding messages based on DNA sequence and recombinant DNA technique. IEEE Trans. Nanotechnol. 18, 299–307 (2019)
Lee, S.-H.: Reversible data hiding for DNA sequence using multilevel histogram shifting. Secur. Commun. Netw. 2018, 1–13 (2018)
Hamad, S., Elhadad, A., Khalifa, A.: DNA watermarking using codon postfix technique. IEEE/ACM Trans. Comput. Biol. Bioinf. 15(5), 1605–1610 (2017)
Ayday, E., Yilmaz, E., Yilmaz, A.: Robust optimization-based watermarking scheme for sequential data. In: \(22^{nd}\) International Symposium on Research in Attacks, Intrusions and Defenses, pp. 323–336 (2019)
Kuribayashi, M., Fukushima, T., Funabiki, N.: Robust and secure data hiding for PDF text document. IEICE Trans. Inf. Syst. 102(1), 41–47 (2019)
Pabinger, S., et al.: A survey of tools for variant analysis of next-generation genome sequencing data. Brief. Bioinform. 15(2), 256–278 (2014)
Danecek, P.: The variant call format and VCF tools. Bioinformatics 27(15), 2156–2158 (2011)
Rani, S., Halder, R.: Comparative analysis of relational database watermarking techniques: an empirical study. IEEE Access 10, 27970–27989 (2022)
Li, Y., Guo, H., Jajodia, S.: Tamper detection and localization for categorical data using fragile watermarks. In: Proceedings of the \(4^{th}\) ACM Workshop on Digital Rights Management, pp. 73–82 (2004)
Chen, B., Wornell, G.W.: Quantization index modulation: a class of provably good methods for digital watermarking and information embedding. IEEE Trans. Inf. Theory 47(4), 1423–1443 (2001)
Genin, E., Redon, R., Deleuze, J.-F., Campion, D., Lambert, J.-C., Dartigues, J.-F.: The French exome (FREX) project: a population-based panel of exomes to help filter out common local variants. Int. Genet. Epidemiol. Soc. 41, 691 (2017)
Bellafqira, R., Ludwig, T.E., Niyitegeka, D., Génin, E., Coatrieux, G.: Privacy-preserving genome-wide association study for rare mutations-a secure framework for externalized statistical analysis. IEEE Access 8, 112515–112529 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bellafqira, R., Al-Ghadi, M., Genin, E., Coatrieux, G. (2023). Robust and Imperceptible Watermarking Scheme for GWAS Data Traceability. In: Zhao, X., Tang, Z., Comesaña-Alfaro, P., Piva, A. (eds) Digital Forensics and Watermarking. IWDW 2022. Lecture Notes in Computer Science, vol 13825. Springer, Cham. https://doi.org/10.1007/978-3-031-25115-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-25115-3_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25114-6
Online ISBN: 978-3-031-25115-3
eBook Packages: Computer ScienceComputer Science (R0)