Skip to main content

Side channel attack on a partially encrypted MPEG-G file

Abstract

New genome sequencing technologies have decreased the cost of generating genomic data, thus increasing storage needs. The International Organization for Standardization (ISO) working group MPEG has developed a standard for genomic data compression with encryption features. The approach taken in standard MPEG-G (ISO/IEC 23092) to compress genomic information was to group similar data into streams. Taking this into account, one of the protection options considered was to encrypt each stream separately. In this paper, we show that an attacker can use an unencrypted stream to deduce the encrypted content if streams are encrypted separately. To do so, we present two different attacks, one based on signal processing and the other one based on neural networks. The signal-based attack only works with unrealistic settings, whereas the neural network-based one recovers data with realistic settings (regarding read length and coverage). The presented results made MPEG reconsider the encryption strategy, before final publication of the standard, discarding separate streams encryption approach.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

References

  1. 1.

    Bentley DR et al (nov 2008) Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456(7218):53–59

    Article  Google Scholar 

  2. 2.

    Bloom BH (Jul 1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426

    Article  Google Scholar 

  3. 3.

    Clark MJ, Chen RR, Lam HYK, Karczewski KJ, Chen RR, Euskirchen G, Butte AJ, Snyder M (Oct 2011) Performance comparison of exome DNA sequencing technologies. Nat Biotechnol 29(10):908–914

    Article  Google Scholar 

  4. 4.

    Cock PJA, Fields CJ, Goto N, Heuer ML, Rice PM (Dec 2009) The sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res 38(6):1767–1771

    Article  Google Scholar 

  5. 5.

    De Cristofaro E, Faber S, Tsudik G (2013) Secure genomic testing with size- and position-hiding private substring matching. In Proceedings of the ACM Conference on Computer and Communications Security, pages 107–117, New York, New York, USA. ACM Press

  6. 6.

    Fang H, Wu Y, Narzisi G, ORawe JA, Jimenez Barrón LT, Rosenbaum J, Ronemus M, Iossifov I, Schatz MC, Lyon GJ (Dec 2014) Reducing INDEL calling errors in whole genome and exome sequencing data. Genome Medicine 6(10):89

    Article  Google Scholar 

  7. 7.

    Fritz MHY, Leinonen R, Cochrane G, Birney E (May 2011) Efficient storage of high throughput DNA sequencing data using reference-based compression. Genome Res 21(5):734–740

    Article  Google Scholar 

  8. 8.

    Genohub. Recommended Coverage and Read Depth for NGS Applications. https://genohub.com/recommended-sequencing-coverage-by-application/, 2019.

  9. 9.

    Global Alliance for Genomics & Health. Beacon Network. https://beacon-network.org/#/.

  10. 10.

    Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In J Mach Learn Res

  11. 11.

    Homer N, Szelinger S, Redman M, Duggan D, Tembe W, Muehling J, Pearson JV, Stephan DA, Nelson SF, Craig DW (Aug 2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet 4(8):e1000167

    Article  Google Scholar 

  12. 12.

    Hosseini M, Pratas D, Pinho AJ (2019) Cryfa: a secure encryption tool for genomic data. Bioinformatics 35(1):146–148

  13. 13.

    Huang Z, Ayday E, Lin H, Aiyar RS, Molyneaux A, Xu Z, Fellay J, Steinmetz LM, Hubaux J-P (Oct 2016) A privacy-preserving solution for compressed storage and selective retrieval of genomic data. Genome Res 26(12):1687–1696

    Article  Google Scholar 

  14. 14.

    Humbert M, Ayday E, Hubaux J-P, Telenti A (2014) Reconciling utility with privacy in genomics. In Proceedings of the 13thWorkshop on Privacy in the Electronic Society - WPES ‘14, pages 11–20, New York, New York, USA. ACM Press.22

  15. 15.

    ISO/IEC JTC 1/SC 29/WG 11. MPEG-G, ISO/IEC 23092 Genomic information representation, 2019.

  16. 16.

    Lader ES et al (Feb 2001) Initial sequencing and analysis of the human genome. Nature 409(6822):860–921

    Article  Google Scholar 

  17. 17.

    Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (Aug 2009) The sequence alignment/map format and SAMtools. Bioinformatics 25(16):2078–2079

    Article  Google Scholar 

  18. 18.

    Meynert AM, Bicknell LS, Hurles ME, Jackson AP, Taylor MS (Dec 2013) Quantifying single nucleotide variant detection sensitivity in exome sequencing. BMC Bioinformatics 14(1):195

    Article  Google Scholar 

  19. 19.

    Nyholt DR, Yu C-E, Visscher PM (Mar 2009) On Jim Watson’s APOE status: genetic information is hard to hide. European journal of human genetics. EJHG 17(2):147–149

    Article  Google Scholar 

  20. 20.

    Venter JC et al (Feb 2001) The sequence of the human genome. Science 291(5507):1304–1351

    Article  Google Scholar 

  21. 21.

    Yates A et al (2016) Ensembl. Nucleic Acids Res 44(D1):D710–D716

    Article  Google Scholar 

Download references

Acknowledgements

The work presented in this paper has been partially supported by the Spanish Research Agency/ERDF (EU), through the project Secure Genomic Information Compression (GenCom, TEC2015-67774-C2-1-R, TEC2015-67774-C2-2-R) and by the Generalitat de Catalunya (2017 SGR 1749).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Silvia Llorente.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Naro, D., Delgado, J. & Llorente, S. Side channel attack on a partially encrypted MPEG-G file. Multimed Tools Appl 80, 20599–20618 (2021). https://doi.org/10.1007/s11042-021-10720-7

Download citation

Keywords

  • Encryption
  • Genomic information
  • Information leakage
  • MPEG-G