Skip to main content

Differentially-Private Logistic Regression for Detecting Multiple-SNP Association in GWAS Databases

  • Conference paper
Privacy in Statistical Databases (PSD 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8744))

Included in the following conference series:

Abstract

Following the publication of an attack on genome-wide association studies (GWAS) data proposed by Homer et al., considerable attention has been given to developing methods for releasing GWAS data in a privacy-preserving way. Here, we develop an end-to-end differentially private method for solving regression problems with convex penalty functions and selecting the penalty parameters by cross-validation. In particular, we focus on penalized logistic regression with elastic-net regularization, a method widely used to in GWAS analyses to identify disease-causing genes. We show how a differentially private procedure for penalized logistic regression with elastic-net regularization can be applied to the analysis of GWAS data and evaluate our method’s performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Austin, E., Pan, W., Shen, X.: Penalized regression and risk prediction in genome-wide association studies. Statistical Analysis and Data Mining 6(4) (August 2013)

    Google Scholar 

  2. Cho, S., et al.: Elastic-net regularization approaches for genome-wide association studies of rheumatoid arthritis. BMC Proceedings 3(suppl. 7), S25 (2009)

    Google Scholar 

  3. Homer, N., et al.: Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP geno-typing microarrays. PLoS Genetics 4(8), e1000167 (2008)

    Google Scholar 

  4. Couzin, J.: Whole-genome data not anonymous, challenging assumptions. Science 321(5894), 1278 (2008)

    Article  Google Scholar 

  5. Zerhouni, E.A., Nabel, E.G.: Protecting aggregate genomic data. Science 322(5898), 44 (2008)

    Article  Google Scholar 

  6. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  7. Uhler, C., Slavkovic, A.B., Fienberg, S.E.: Privacy-preserving data sharing for genome-wide association studies. Journal of Privacy and Confidentiality 5(1), 137–166 (2013)

    Google Scholar 

  8. Johnson, A., Shmatikov, V.: Privacy-preserving data exploration in genome-wide association studies. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1079–1087 (2013)

    Google Scholar 

  9. Yu, F., et al.: Scalable Privacy-Preserving Data Sharing Methodology for Genome-Wide Association Studies. Journal of Biomedical Informatics (February 2014)

    Google Scholar 

  10. Kifer, D., Smith, A., Thakurta, A.: Private convex empirical risk minimization and high-dimensional regression. Proceedings of Journal of Machine Learning Research - Proceedings Track 23, 25.1–25.40 (2012)

    Google Scholar 

  11. Chaudhuri, K., Vinterbo, S.A.: A stability-based validation procedure for differentially private machine learning. In: Advances in Neural Information Processing Systems, pp. 1–19 (2013)

    Google Scholar 

  12. Chaudhuri, K., Monteleoni, C., Sarwate, A.D.: Differentially private empirical risk minimization. JMLR 12(7), 1069–1109 (2011)

    MATH  MathSciNet  Google Scholar 

  13. Laurent, B., Massart, P.: Adaptive estimation of a quadratic functional by model selection. Annals of Statistics 28(5), 1302–1338 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  14. Wright, F.A., et al.: Simulating association studies: a data-based resampling method for candidate regions or whole genome scans. Bioinformatics 23(19), 2581–2588 (2007)

    Article  Google Scholar 

  15. Malaspinas, A.S., Uhler, C.: Detecting epistasis via Markov bases. Journal of Algebraic Statistics 2(1), 36–53 (2010)

    MathSciNet  Google Scholar 

  16. GĂ³mez, E., Gomez-Viilegas, M.A., MarĂ­n, J.M.: A multivariate generalization of the power exponential family of distributions. Communications in Statistics - Theory and Methods 27(3), 589–600 (1998)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Yu, F., Rybar, M., Uhler, C., Fienberg, S.E. (2014). Differentially-Private Logistic Regression for Detecting Multiple-SNP Association in GWAS Databases. In: Domingo-Ferrer, J. (eds) Privacy in Statistical Databases. PSD 2014. Lecture Notes in Computer Science, vol 8744. Springer, Cham. https://doi.org/10.1007/978-3-319-11257-2_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11257-2_14

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11256-5

  • Online ISBN: 978-3-319-11257-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics