Abstract
With the rapid development of genomic sequencing technology, the cost of obtaining personal genomic data and analyzing it effectively has been gradually reduced. The analysis and utilization of genomic data have gradually come into the public view, the privacy leakage of genomic data has aroused the attention of researchers. Genomic data has unique format and a large amount of data, but the existing genetic privacy protection schemes often fail to consider security, availability and efficiency together. In this paper, we analyzed widely used genomic data file formats and designed a hybrid encryption scheme for large genomic data files. Firstly, we designed a key agreement protocol based on RSA asymmetric cryptography. Secondly, we used AES symmetric encryption to encrypt the genomic data by optimizing the packet processing of files and multithreading encryption, and improved the usability by assisting the computing platform with key management. Software implementation indicates that the scheme can be applied to the secure transmission of genomic data in the network environment and provide an efficient encryption method for the privacy protection of genomic data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Here K is the AES encryption key, \( K_{p} \) is the public key of RSA, \( K_{s} \) is the private key of RSA, m is the plaintext, \( H(\bullet ) \) represents the hash value.
References
Homer, N., et al.: Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 4, e1000167 (2008)
Wang, R., Li, Y.F., Wang, X.F., Tang, H.X., Zhou, X.Y.: Learning your identity and disease from research papers: information leaks in genome wide association study. In: Proceedings of the 16th ACM Conference on Computer and Communications Security, CCS 2009, Chicago, Illinois, vol. 10, no. 1145, pp. 534–544 (2009). https://doi.org/10.1145/1653662.1653726
Gymrek, M., McGuire, A.L., Golan, D., Halperin, E., Erlich, Y.: Identifying personal genomes by surname inference. Science 339(6117), 321–324 (2013). https://doi.org/10.1126/science.1229566
Lippert, C., et al.: Identification of individuals by trait prediction using whole-genome sequencing data. PNAS 114(38), 10166–10171 (2017). https://doi.org/10.1073/pnas.1711125114
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl.-Based Syst. 10(05), 557–570 (2002). https://doi.org/10.1142/S0218488502001648
Nyholt, D.R., Yu, C., Visscher, P.M.: On Jim Watson’s APOE status: genetic information is hard to hide. Eur. J. Hum. Genet. 17(2), 147–149 (2009). https://doi.org/10.1038/ejhg.2008.198
Samarati, P., Sweeney, L.: Generalizing data to provide anonymity when disclosing information. In: PODS, p. 188 (1998). https://doi.org/10.1145/275487.275508
Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: privacy beyond k-anonymity. TKDD 1(1), 3 (2007). https://doi.org/10.1109/ICDE.2006.1
Li, N., Li, T., Venkatasubramanian, S.: Closeness: a new privacy measure for data publishing. IEEE Trans. Knowl. Data Eng. 22(7), 943–956 (2010). https://doi.org/10.1109/tkde.2009.139
Johnson, A., Shmatikov, V.: Privacy-preserving data exploration in genome-wide association studies. In: Proceeding of the 19th ACM SIGKDD International Conference on Knowledge/Discovery and Data Minging, pp. 1079–1087. ACM (2013). https://doi.org/10.1145/2487575.2487687
Ayday, E., Raisaro, J.L., Hubaux, J.P.: Personal use of the genmic data: privacy vs. storage cost. In: Proceeding of IEEE Global Communications Conference, Exhibition and Industry Forum, pp. 2723–2729 (2013). https://doi.org/10.1109/GLOCOM.2013.6831486
Cristofaro, E.D., Faber, S., Tsudik, G.: Secure genomic testing with size- and position-hiding private substring matching. In: Proceedings of the 12th ACM Workshop on Privacy in the Electronic Society, pp. 107–118. ACM (2013). https://doi.org/10.1145/2517840.2517849
Chen, Y., Peng, B., Wang, X., Tang, H.: Large-scale privacy-preserving mapping of human genomic sequences on hybrid clouds. In: Proceeding of the 19th Network and Distributed System Security Symposium, San Diego, California, USA (2012)
Burrows, M., Abadi, M., Needham, R.: A logic of authentication. SIGOPS Oper. Syst. Rev. 23(5), 1–13 (1989). https://doi.org/10.1145/77648.77649
Schneider, T., Tkachenko, O.: EPISODE: efficient privacy-PreservIng similar sequence queries on outsourced genomic DatabasEs? In: Asia CCS 2019 Proceedings of the 2019 ACM Asia Conference on Computer and Communications Security, pp. 315–327 (2019). https://doi.org/10.1145/3321705.3329800
Acknowledgment
This project is supported by the National Key Research and Development Program of China (No. 2016YFC1000307), the National Natural Science Foundation of China (No. 61571024, No. 61971021) and Aeronautical Science Foundation of China (No. 2018ZC51016) for valuable helps.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Jiang, Y., Shang, T., Liu, J., Cao, Z., Geng, Y. (2019). An Efficient Hybrid Encryption Scheme for Large Genomic Data Files. In: Ning, H. (eds) Cyberspace Data and Intelligence, and Cyber-Living, Syndrome, and Health. CyberDI CyberLife 2019 2019. Communications in Computer and Information Science, vol 1137. Springer, Singapore. https://doi.org/10.1007/978-981-15-1922-2_15
Download citation
DOI: https://doi.org/10.1007/978-981-15-1922-2_15
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1921-5
Online ISBN: 978-981-15-1922-2
eBook Packages: Computer ScienceComputer Science (R0)