Differentially-Private Logistic Regression for Detecting Multiple-SNP Association in GWAS Databases
- 1k Downloads
Following the publication of an attack on genome-wide association studies (GWAS) data proposed by Homer et al., considerable attention has been given to developing methods for releasing GWAS data in a privacy-preserving way. Here, we develop an end-to-end differentially private method for solving regression problems with convex penalty functions and selecting the penalty parameters by cross-validation. In particular, we focus on penalized logistic regression with elastic-net regularization, a method widely used to in GWAS analyses to identify disease-causing genes. We show how a differentially private procedure for penalized logistic regression with elastic-net regularization can be applied to the analysis of GWAS data and evaluate our method’s performance.
KeywordsDifferential privacy genome-wide association studies (GWAS) logistic regression elastic-net ridge regression lasso cross-validation single nucleotide polymorphism (SNP)
Unable to display preview. Download preview PDF.
- 1.Austin, E., Pan, W., Shen, X.: Penalized regression and risk prediction in genome-wide association studies. Statistical Analysis and Data Mining 6(4) (August 2013)Google Scholar
- 2.Cho, S., et al.: Elastic-net regularization approaches for genome-wide association studies of rheumatoid arthritis. BMC Proceedings 3(suppl. 7), S25 (2009)Google Scholar
- 3.Homer, N., et al.: Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP geno-typing microarrays. PLoS Genetics 4(8), e1000167 (2008)Google Scholar
- 7.Uhler, C., Slavkovic, A.B., Fienberg, S.E.: Privacy-preserving data sharing for genome-wide association studies. Journal of Privacy and Confidentiality 5(1), 137–166 (2013)Google Scholar
- 8.Johnson, A., Shmatikov, V.: Privacy-preserving data exploration in genome-wide association studies. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1079–1087 (2013)Google Scholar
- 9.Yu, F., et al.: Scalable Privacy-Preserving Data Sharing Methodology for Genome-Wide Association Studies. Journal of Biomedical Informatics (February 2014)Google Scholar
- 10.Kifer, D., Smith, A., Thakurta, A.: Private convex empirical risk minimization and high-dimensional regression. Proceedings of Journal of Machine Learning Research - Proceedings Track 23, 25.1–25.40 (2012)Google Scholar
- 11.Chaudhuri, K., Vinterbo, S.A.: A stability-based validation procedure for differentially private machine learning. In: Advances in Neural Information Processing Systems, pp. 1–19 (2013)Google Scholar