Skip to main content

Robust and Imperceptible Watermarking Scheme for GWAS Data Traceability

  • Conference paper
  • First Online:
Digital Forensics and Watermarking (IWDW 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13825))

Included in the following conference series:

  • 395 Accesses

Abstract

This paper proposes the first robust watermarking method of outsourced or shared genomic data in the context of genome-wide association studies (GWAS) with the primary purpose of identifying the individual or entity at the origin of an illegal information redistribution or disclosure. Our scheme’s first unique feature is that it employs a database watermarking strategy to take advantage of the fact that GWAS data are stored in variant call format (VCF) files, which have a database-like structure. Second, it proposes a quantization index modulation based on watermarking modulation for GWAS data under the constraint of not interfering with identifying candidate variants or genes involved in the pathology. We evaluate the theoretical performance of our method in terms of watermarking insertion capacity, distortion, and robustness against different attacks. Experimental results conducted on real data and the weighted-sum statistic (WSS) GWAS study demonstrate the efficiency of the proposed scheme and that it can be used for identifying the cloud service providers (geneticists) at the origin of an information disclosure even if the genotype data has been modified.

R. Bellafqira and M. Al-Ghadi—Equal contributions.

This work was supported in part by the French Government support granted to the Labex CominLabs and managed by the ANR through the “Investing for the Future” Program under Grant ANR-10-LABX-07-01 through the project TADOP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mehrgou, A., Akouchekian, M.: The importance of BRCA1 and BRCA2 genes mutations in breast cancer development. Med. J. Islamic Repub. Iran (MJIRI) 30(369), 1–12 (2016)

    Google Scholar 

  2. Ginsburg, G.: Medical genomics: gather and use genetic data in health care. Nat. News 508(7497), 451–453 (2014)

    Article  Google Scholar 

  3. Wang, M.H., Cordell, H.J., Van Steen, K.:Statistical methods for genome-wide association studies. In: Seminars in Cancer Biology, vol. 55, pp. 53–60. Elsevier (2019)

    Google Scholar 

  4. Taleb, A., Kirchler, M., Monti, R., Lippert, C.: ContiG: self-supervised multimodal contrastive learning for medical imaging with genetics. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 20 908–20 921. IEEE (2022)

    Google Scholar 

  5. Michael, B.E., Yann, L.G., Sarah, E.J., Napolioni, V., Michael, G.D., Zihuai, H.: A fast and robust strategy to remove variant-level artifacts in alzheimer disease sequencing project data. Neurol. Genet. 8(5), e200012 (2022)

    Google Scholar 

  6. Shin, J., et al.: PhenGenVar: a user-friendly genetic variant detection and visualization tool for precision medicine. J. Personalized Med. 12(6), 1–11 (2022)

    Article  Google Scholar 

  7. Ozaki, K., et al.: Functional SNPs in the lymphotoxin-\(\alpha \) gene that are associated with susceptibility to myocardial infarction. Nat. Genet. 32(4), 650–654 (2002)

    Article  Google Scholar 

  8. Madsen, B.E., Browning, S.R.: A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 5(2), 1–11 (2009)

    Article  Google Scholar 

  9. Ding, H., Tian, Y., Peng, C., Zhang, Y., Xiang, S.: Inference attacks on genomic privacy with an improved HMM and an RCNN model for unrelated individuals. Inf. Sci. 512, 207–218 (2020)

    Article  MATH  Google Scholar 

  10. Bellafqira, R., Coatrieux, G., Genin, E., Cozic, M.: Secure multilayer perceptron based on homomorphic encryption. In: Yoo, C.D., Shi, Y.-Q., Kim, H.J., Piva, A., Kim, G. (eds.) IWDW 2018. LNCS, vol. 11378, pp. 322–336. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11389-6_24

    Chapter  Google Scholar 

  11. Rady, M., Abdelkader, T., Ismail, R.: Integrity and confidentiality in cloud outsourced data. Ain Shams Eng. J. 10, 275–285 (2019)

    Article  Google Scholar 

  12. Wang, X., Jiang, X., Vaidya, J.: Efficient verification for outsourced genome-wide association studies. J. Biomed. Inform. 117, 103714 (2021)

    Article  Google Scholar 

  13. Wang, J., Du, X., Lu, J., Lu, W.: Bucket-based authentication for outsourced databases. Concurrency Comput. Pract. Experience 22(9), 1160–1180 (2010)

    Google Scholar 

  14. Niyitegeka, David, Coatrieux, Gouenou, Bellafqira, Reda, Genin, Emmanuelle, Franco-Contreras, Javier: Dynamic watermarking-based integrity protection of homomorphically encrypted databases – application to outsourced genetic data. In: Yoo, Chang D.., Shi, Yun-Qing., Kim, Hyoung Joong, Piva, Alessandro, Kim, Gwangsu (eds.) IWDW 2018. LNCS, vol. 11378, pp. 151–166. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11389-6_12

    Chapter  Google Scholar 

  15. Boujdad, F.-Z., Niyitegeka, D., Bellafqira, R., Coatrieux, G., Génin, E., Südholt, M.S.: A hybrid cloud deployment architecture for privacy-preserving collaborative genome-wide association studies. In: Gladyshev, P., Goel, S., James, J., Markowsky, G., Johnson, D. (eds.) ICDFC 2021. LNICST, vol. 441, pp. 342–359. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-06365-7_21

    Chapter  Google Scholar 

  16. Chen, W.: An artificial chromosome for data storage. Nat. Sci. Rev. 8(5), nwab028 (2021)

    Google Scholar 

  17. Nguyen, T.T., Cai, K., Song, W., Immink, K.A.S.: Optimal single chromosome-inversion correcting codes for data storage in live DNA. In: IEEE International Symposium on Information Theory (ISIT), pp. 1791–1796. IEEE (2022)

    Google Scholar 

  18. Vinodhini, R., Malathi, P.: Hiding information in the DNA sequence using DNA steganographic algorithms with double-layered security. Int. J. Inf. Secur. Priv. (IJISP) 16(1), 1–20 (2022)

    Google Scholar 

  19. Wang, Y., Han, Q., Cui, G., Sun, J.: Hiding messages based on DNA sequence and recombinant DNA technique. IEEE Trans. Nanotechnol. 18, 299–307 (2019)

    Article  Google Scholar 

  20. Lee, S.-H.: Reversible data hiding for DNA sequence using multilevel histogram shifting. Secur. Commun. Netw. 2018, 1–13 (2018)

    Google Scholar 

  21. Hamad, S., Elhadad, A., Khalifa, A.: DNA watermarking using codon postfix technique. IEEE/ACM Trans. Comput. Biol. Bioinf. 15(5), 1605–1610 (2017)

    Google Scholar 

  22. Ayday, E., Yilmaz, E., Yilmaz, A.: Robust optimization-based watermarking scheme for sequential data. In: \(22^{nd}\) International Symposium on Research in Attacks, Intrusions and Defenses, pp. 323–336 (2019)

    Google Scholar 

  23. Kuribayashi, M., Fukushima, T., Funabiki, N.: Robust and secure data hiding for PDF text document. IEICE Trans. Inf. Syst. 102(1), 41–47 (2019)

    Article  Google Scholar 

  24. Pabinger, S., et al.: A survey of tools for variant analysis of next-generation genome sequencing data. Brief. Bioinform. 15(2), 256–278 (2014)

    Article  Google Scholar 

  25. Danecek, P.: The variant call format and VCF tools. Bioinformatics 27(15), 2156–2158 (2011)

    Article  Google Scholar 

  26. Rani, S., Halder, R.: Comparative analysis of relational database watermarking techniques: an empirical study. IEEE Access 10, 27970–27989 (2022)

    Article  Google Scholar 

  27. Li, Y., Guo, H., Jajodia, S.: Tamper detection and localization for categorical data using fragile watermarks. In: Proceedings of the \(4^{th}\) ACM Workshop on Digital Rights Management, pp. 73–82 (2004)

    Google Scholar 

  28. Chen, B., Wornell, G.W.: Quantization index modulation: a class of provably good methods for digital watermarking and information embedding. IEEE Trans. Inf. Theory 47(4), 1423–1443 (2001)

    Article  MATH  Google Scholar 

  29. Genin, E., Redon, R., Deleuze, J.-F., Campion, D., Lambert, J.-C., Dartigues, J.-F.: The French exome (FREX) project: a population-based panel of exomes to help filter out common local variants. Int. Genet. Epidemiol. Soc. 41, 691 (2017)

    Google Scholar 

  30. Bellafqira, R., Ludwig, T.E., Niyitegeka, D., Génin, E., Coatrieux, G.: Privacy-preserving genome-wide association study for rare mutations-a secure framework for externalized statistical analysis. IEEE Access 8, 112515–112529 (2020)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reda Bellafqira .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bellafqira, R., Al-Ghadi, M., Genin, E., Coatrieux, G. (2023). Robust and Imperceptible Watermarking Scheme for GWAS Data Traceability. In: Zhao, X., Tang, Z., Comesaña-Alfaro, P., Piva, A. (eds) Digital Forensics and Watermarking. IWDW 2022. Lecture Notes in Computer Science, vol 13825. Springer, Cham. https://doi.org/10.1007/978-3-031-25115-3_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-25115-3_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-25114-6

  • Online ISBN: 978-3-031-25115-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics