Cloud-Assisted Read Alignment and Privacy

  • Maria Fernandes
  • Jérémie Decouchant
  • Francisco M. Couto
  • Paulo Esteves-Verissimo
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 616)


Thanks to the rapid advances in sequencing technologies, genomic data is now being produced at an unprecedented rate. To adapt to this growth, several algorithms and paradigm shifts have been proposed to increase the throughput of the classical DNA workflow, e.g. by relying on the cloud to perform CPU intensive operations. However, the scientific community raised an alarm due to the possible privacy-related attacks that can be executed on genomic data. In this paper we review the state of the art in cloud-based alignment algorithms that have been developed for performance. We then present several privacy-preserving mechanisms that have been, or could be, used to align reads at an incremental performance cost. We finally argue for the use of risk analysis throughout the DNA workflow, to strike a balance between performance and protection of data.


Read alignment Cloud computing Genomic data privacy 



This work was supported by the Fonds National de la Recherche Luxembourg (FNR) through PEARL grant FNR/P14/8149128, and by the Fundação para a Ciência e para a Tecnologia (FCT) through funding of the LaSIGE Research Unit, ref. UID/CEC/00408/2013.


  1. 1.
    Akgün, M., Bayrak, A.O., Ozer, B., et al.: Privacy preserving processing of genomic data: a survey. J. Biomed. Inf. 56, 103–111 (2015)CrossRefGoogle Scholar
  2. 2.
    Altschul, S.F., Gish, W., Miller, W., et al.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)CrossRefGoogle Scholar
  3. 3.
    Baron, J., El Defrawy, K., Minkovich, K., et al.: 5pm: secure pattern matching. In: SCN, pp. 222–240 (2012)Google Scholar
  4. 4.
    Bessani, A., Brandt, J., Bux, M., et al.: Biobankcloud: a platform for the secure storage, sharing, and processing of large biomedical data sets. In: DMAH (2015)Google Scholar
  5. 5.
    Chan, I.S., Ginsburg, G.S.: Personalized medicine: progress and promise. Ann. Rev. Genomics Hum. Genet. 12(1), 217–244 (2011)CrossRefGoogle Scholar
  6. 6.
    Chen, Y., Peng, B., Wang, X., et al.: Large-scale privacy-preserving mapping of human genomic sequences on hybrid clouds. In: NDSS (2012)Google Scholar
  7. 7.
    Cogo, V.V., Bessani, A., Couto, F.M., et al.: A high-throughput method to detect privacy-sensitive human genomic data. In: ACM WPES, pp. 101–110 (2015)Google Scholar
  8. 8.
    Dove, E.S., Joly, Y., Tasse, A.M., et al.: Genomic cloud computing: legal and ethical points to consider. Eur. J. Hum. Genet. 23, 1271–1278 (2015)CrossRefGoogle Scholar
  9. 9.
    Erlich, Y., Narayanan, A.: Routes for breaching and protecting genetic privacy. Nat. Rev. Genet. 15, 409–421 (2014)CrossRefGoogle Scholar
  10. 10.
    Gymrek, M., McGuire, A.L., Golan, D., et al.: Identifying personal genomes by surname inference. Science 339(6117), 321–324 (2013)CrossRefGoogle Scholar
  11. 11.
    Homer, N., Szelinger, S., Redman, M., et al.: Resolving individuals contributing trace amounts of dna to highly complex mixtures using high-density snp genotyping microarrays. PLoS Genet. 4(8), e1000167 (2008)CrossRefGoogle Scholar
  12. 12.
    Huang, Y., Evans, D., Katz, J., et al.: Faster secure two-party computation using garbled circuits. In: USENIX Security Symposium, vol. 201(1) (2011)Google Scholar
  13. 13.
    Kaye, J., Heeney, C., Hawkins, N., et al.: Data sharing in genomics re-shaping scientific practice. Nat. Rev. Genet. 10(5), 331–335 (2009)CrossRefGoogle Scholar
  14. 14.
    Kienzler, R., Bruggmann, R., Ranganathan, A., et al.: Large-scale DNA sequence analysis in the cloud: a stream-based approach. In: ICPP, vol. 2, pp. 467–476 (2012)Google Scholar
  15. 15.
    Matsunaga, A., Tsugawa, M., Fortes, J.: Cloudblast: combining mapreduce and virtualization on distributed resources for bioinformatics applications. In: ESCIENCE 2008, pp. 222–229 (2008)Google Scholar
  16. 16.
    Namazi, M., Troncoso-Pastoriza, J.R., Pérez-González, F.: Dynamic privacy-preserving genomic susceptibility testing. In: ACM MMSec, pp. 45–50 (2016)Google Scholar
  17. 17.
    Naveed, M., Ayday, E., Clayton, E.W., et al.: Privacy in the genomic era. ACM CSUR 48(1), 1–44 (2015)CrossRefGoogle Scholar
  18. 18.
    Nyholt, D.R., Yu, C.E., Visscher, P.M.: On Jim Watsons apoe status: genetic information is hard to hide. Eur. J. Hum. Genet. 17, 147–149 (2009)CrossRefGoogle Scholar
  19. 19.
    O’Driscoll, A., Daugelaite, J., Sleator, R.D.: “Big data”, hadoop and cloud computing in genomics. J. Biomed. Inf. 46(5), 774–781 (2013)CrossRefMATHGoogle Scholar
  20. 20.
    Rocha, F., Correia, M.: Lucy in the sky without diamonds: stealing confidential data in the cloud. In: DSNW, pp. 129–134 (2011)Google Scholar
  21. 21.
    Stein, L.D.: The case for cloud computing in genome informatics. Genome Biol. 11(5), 207 (2010)CrossRefGoogle Scholar
  22. 22.
    Talukder, A., Gandham, S., Prahalad, H., et al.: Cloud-maq: the cloud-enabled scalable whole genome reference assembly application. In: WOCN, pp. 1–5 (2010)Google Scholar
  23. 23.
    Vayena, E., Gasser, U.: Between openness and privacy in genomics. PLoS Med. 13(1), 1–7 (2016)CrossRefGoogle Scholar
  24. 24.
    Zhang, K., Zhou, X., Chen, Y., et al.: Sedic: Privacy-aware data intensive computing on hybrid clouds. In: ACM CCS, pp. 515–526 (2011)Google Scholar
  25. 25.
    Zhou, X., Peng, B., Li, Y.F., et al.: To release or not to release: Evaluating information leaks in aggregate human-genome data. In: ESORICS, pp. 607–627 (2011)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Maria Fernandes
    • 1
  • Jérémie Decouchant
    • 1
  • Francisco M. Couto
    • 2
  • Paulo Esteves-Verissimo
    • 1
  1. 1.SnT – Interdisciplinary Centre for Security, Reliability and TrustUniversity of LuxembourgLuxembourgLuxembourg
  2. 2.LASIGE, Faculdade de CiênciasUniversidade de LisboaLisboaPortugal

Personalised recommendations