New genome sequencing technologies have decreased the cost of generating genomic data, thus increasing storage needs. The International Organization for Standardization (ISO) working group MPEG has developed a standard for genomic data compression with encryption features. The approach taken in standard MPEG-G (ISO/IEC 23092) to compress genomic information was to group similar data into streams. Taking this into account, one of the protection options considered was to encrypt each stream separately. In this paper, we show that an attacker can use an unencrypted stream to deduce the encrypted content if streams are encrypted separately. To do so, we present two different attacks, one based on signal processing and the other one based on neural networks. The signal-based attack only works with unrealistic settings, whereas the neural network-based one recovers data with realistic settings (regarding read length and coverage). The presented results made MPEG reconsider the encryption strategy, before final publication of the standard, discarding separate streams encryption approach.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Bentley DR et al (nov 2008) Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456(7218):53–59
Bloom BH (Jul 1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426
Clark MJ, Chen RR, Lam HYK, Karczewski KJ, Chen RR, Euskirchen G, Butte AJ, Snyder M (Oct 2011) Performance comparison of exome DNA sequencing technologies. Nat Biotechnol 29(10):908–914
Cock PJA, Fields CJ, Goto N, Heuer ML, Rice PM (Dec 2009) The sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res 38(6):1767–1771
De Cristofaro E, Faber S, Tsudik G (2013) Secure genomic testing with size- and position-hiding private substring matching. In Proceedings of the ACM Conference on Computer and Communications Security, pages 107–117, New York, New York, USA. ACM Press
Fang H, Wu Y, Narzisi G, ORawe JA, Jimenez Barrón LT, Rosenbaum J, Ronemus M, Iossifov I, Schatz MC, Lyon GJ (Dec 2014) Reducing INDEL calling errors in whole genome and exome sequencing data. Genome Medicine 6(10):89
Fritz MHY, Leinonen R, Cochrane G, Birney E (May 2011) Efficient storage of high throughput DNA sequencing data using reference-based compression. Genome Res 21(5):734–740
Genohub. Recommended Coverage and Read Depth for NGS Applications. https://genohub.com/recommended-sequencing-coverage-by-application/, 2019.
Global Alliance for Genomics & Health. Beacon Network. https://beacon-network.org/#/.
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In J Mach Learn Res
Homer N, Szelinger S, Redman M, Duggan D, Tembe W, Muehling J, Pearson JV, Stephan DA, Nelson SF, Craig DW (Aug 2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet 4(8):e1000167
Hosseini M, Pratas D, Pinho AJ (2019) Cryfa: a secure encryption tool for genomic data. Bioinformatics 35(1):146–148
Huang Z, Ayday E, Lin H, Aiyar RS, Molyneaux A, Xu Z, Fellay J, Steinmetz LM, Hubaux J-P (Oct 2016) A privacy-preserving solution for compressed storage and selective retrieval of genomic data. Genome Res 26(12):1687–1696
Humbert M, Ayday E, Hubaux J-P, Telenti A (2014) Reconciling utility with privacy in genomics. In Proceedings of the 13thWorkshop on Privacy in the Electronic Society - WPES ‘14, pages 11–20, New York, New York, USA. ACM Press.22
ISO/IEC JTC 1/SC 29/WG 11. MPEG-G, ISO/IEC 23092 Genomic information representation, 2019.
Lader ES et al (Feb 2001) Initial sequencing and analysis of the human genome. Nature 409(6822):860–921
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (Aug 2009) The sequence alignment/map format and SAMtools. Bioinformatics 25(16):2078–2079
Meynert AM, Bicknell LS, Hurles ME, Jackson AP, Taylor MS (Dec 2013) Quantifying single nucleotide variant detection sensitivity in exome sequencing. BMC Bioinformatics 14(1):195
Nyholt DR, Yu C-E, Visscher PM (Mar 2009) On Jim Watson’s APOE status: genetic information is hard to hide. European journal of human genetics. EJHG 17(2):147–149
Venter JC et al (Feb 2001) The sequence of the human genome. Science 291(5507):1304–1351
Yates A et al (2016) Ensembl. Nucleic Acids Res 44(D1):D710–D716
The work presented in this paper has been partially supported by the Spanish Research Agency/ERDF (EU), through the project Secure Genomic Information Compression (GenCom, TEC2015-67774-C2-1-R, TEC2015-67774-C2-2-R) and by the Generalitat de Catalunya (2017 SGR 1749).
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Naro, D., Delgado, J. & Llorente, S. Side channel attack on a partially encrypted MPEG-G file. Multimed Tools Appl 80, 20599–20618 (2021). https://doi.org/10.1007/s11042-021-10720-7
- Genomic information
- Information leakage