A differential privacy DNA motif finding method based on closed frequent patterns

Wu, Xiang; Wei, Yuyang; Mao, Yaqing; Wang, Liang

doi:10.1007/s10586-017-1691-9

A differential privacy DNA motif finding method based on closed frequent patterns

Published: 12 January 2018

Volume 22, pages 2907–2919, (2019)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Xiang Wu ORCID: orcid.org/0000-0001-5190-9781¹,
Yuyang Wei¹,
Yaqing Mao¹ &
…
Liang Wang¹

485 Accesses
4 Citations
Explore all metrics

Abstract

As one of the basic research methods of bioinformatics, DNA motif finding is of great significance to the study of mechanisms for regulating gene expression and the discovery of biological functional sites. However, because of the high sensitivity of DNA data, the privacy disclosure of these data during motif finding has become a bottleneck in the field of gene research. Meanwhile, traditional privacy protection data mining methods cannot deal with DNA sequences directly, and the existing private motif finding methods usually decrease the utility of the results. To solve these problems, we propose a high-utility motif finding algorithm based on \(\epsilon \)-differential privacy, which is known as a rigorous definition of privacy with meaningful privacy guarantees in the presence of arbitrary external information. Our solution makes use of the closed frequent pattern set to reduce redundant motifs of result sets and obtain accurate motifs results, satisfying \(\epsilon \)-differential privacy. Furthermore, a post-processing method based on the best linear unbiased estimate is used to optimize the utility of noisy consolidated motif support. Experiments on real-life DNA sequence datasets confirm that our algorithm is superior to the existing algorithms in terms of utility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Novel algorithms for LDD motif search

Article Open access 06 June 2019

DNA motif discovery using chemical reaction optimization

Article 11 July 2020

DNA Sequence Motif Discovery Based on Kd-Trees and Genetic Algorithm

References

Agrawal, R., Srikant, R.: Mining sequential patterns. In: icde, p. 3 (1995)
Amphawan, K., Lenca, P.: Mining top-k frequent-regular closed patterns. Expert Syst. Appl. 42(21), 7882–7894 (2015)
Article Google Scholar
Bhaskar, R., Laxman, S., Smith, A., Thakurta, A.: Discovering frequent patterns in sensitive data. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 503–512. ACM (2010)
Blanchette, M., Schwikowski, B., Tompa, M.: Algorithms for phylogenetic footprinting. J. Comput. Biol. 9(2), 211–223 (2002)
Article Google Scholar
Chen, R., Acs, G., Castelluccia, C.: Differentially private sequential data publication via variable-length n-grams. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security, pp. 638–649. ACM (2012a)
Chen, R., Fung, B., Desai, B.C., Sossou, N.M.: Differentially private transit data publication: a case study on the montreal transportation system. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 213–221. ACM (2012b)
Chen, R., Peng, Y., Choi, B., Xu, J., Hu, H.: A private dna motif finding algorithm. J. Biomed. Inf. 50, 122–132 (2014)
Article Google Scholar
Dwork, C., Roth, A., et al.: The algorithmic foundations of differential privacy. Foundations and trends\({\textregistered }\). Theor. Comput. Sci. 9(3–4), 211–407 (2014)
Google Scholar
Geng, Q., Viswanath, P.: The optimal mechanism in differential privacy. In: 2014 IEEE International Symposium on Information Theory (ISIT), pp. 2371–2375. IEEE (2014)
Geng, Q., Viswanath, P.: Optimal noise adding mechanisms for approximate differential privacy. IEEE Trans. Inf. Theory 62(2), 952–969 (2016)
Article MathSciNet Google Scholar
Guo-Qing, L., Xiao-Jian, Z., Li-Ping, D., Yan-Feng, L., Xin, L.: Frequent sequential pattern mining under differential privacy. J. Comput. Res. Dev. 52(12), 2789–2801 (2015)
Google Scholar
Gymrek, M., McGuire, A.L., Golan, D., Halperin, E., Erlich, Y.: Identifying personal genomes by surname inference. Science 339(6117), 321–324 (2013)
Article Google Scholar
Han, J., Pei, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In: proceedings of the 17th International Conference on Data Engineering, pp. 215–224 (2001)
Hay, M., Rastogi, V., Miklau, G., Suciu, D.: Boosting the accuracy of differentially private histograms through consistency. Proc.VLDB Endow. 3(1–2), 1021–1032 (2010)
Article Google Scholar
Holohan, N., Leith, D.J., Mason, O.: Differential privacy in metric spaces: numerical, categorical and functional data under the one roof. Inf. Sci. 305, 256–268 (2015)
Article MathSciNet Google Scholar
Homer, N., Szelinger, S., Redman, M., Duggan, D., Tembe, W., Muehling, J., Pearson, J.V., Stephan, D.A., Nelson, S.F., Craig, D.W.: Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 4(8), 167e1000 (2008)
Article Google Scholar
Johnson, A., Shmatikov, V.: Privacy-preserving data exploration in genome-wide association studies. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1079–1087. ACM (2013)
Kurtz, S., Choudhuri, J.V., Ohlebusch, E., Schleiermacher, C., Stoye, J., Giegerich, R.: Reputer: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res. 29(22), 4633–4642 (2001)
Article Google Scholar
Le, T., Vo, B.: An n-list-based algorithm for mining frequent closed patterns. Expert Syst. Appl. 42(19), 6648–6657 (2015)
Article Google Scholar
Li, N., Qardaji, W., Dong, S., Cao, J.: Privbasis: frequent itemset mining with differential privacy. Proc. Vldb Endow. 5(11), 1340–1351 (2012)
Article Google Scholar
Malin, B.A.: Protecting genomic sequence anonymity with generalization lattices. Methods Inf. Med. 44(5), 687 (2005)
Article Google Scholar
Mrzek, J.: Finding sequence motifs in prokaryotic genomes-a brief practical guide for a microbiologist. Brief. Bioinf. 10(5), 525 (2009)
Article Google Scholar
Pavesi, G., Mauri, G., Pesole, G.: An algorithm for finding signals of unknown length in DNA sequences. Bioinformatics 17(1), S207–S214 (2001)
Article Google Scholar
Qiao, M., Zhang, D.: Efficiently matching frequent patterns based on bitmap inverted files built from closed itemsets. Int. J. Artif. Intell. Tools 21(03), 1250011 (2012)
Article Google Scholar
Ren, J.D., Yang, J., Li, Y.: Mining weighted closed sequential patterns in large databases. In: Proceedings of 2008. FSKD’08. Fifth International Conference on Fuzzy Systems and Knowledge Discovery, vol. 5, pp. 640–644. IEEE (2008)
Simmons, S., Berger, B.: Realizing privacy preserving genome-wide association studies. Bioinformatics 32(9), 1293–1300 (2016)
Article Google Scholar
Simmons, S., Sahinalp, C., Berger, B.: Enabling privacy-preserving gwass in heterogeneous human populations. Cell Syst. 3(1), 54–61 (2016)
Article Google Scholar
Staden, R.: Methods for discovering novel motifs in nucleic acid sequences. Bioinformatics 5(4), 293–298 (1989)
Article Google Scholar
Su, S., Xu, S., Cheng, X., Li, Z., Yang, F.: Differentially private frequent itemset mining via transaction splitting. IEEE Trans. Knowl. Data Eng. 27(7), 1875–1891 (2015)
Article Google Scholar
Tompa, M., Li, N., Bailey, T.L., Church, G.M., De, M.B., Eskin, E., Favorov, A.V., Frith, M.C., Fu, Y., Kent, W.J.: Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23(1), 137 (2005)
Article Google Scholar
Tramèr, F., Huang, Z., Hubaux, J.P., Ayday, E.: Differential privacy with bounded priors: reconciling utility and privacy in genome-wide association studies. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1286–1297. ACM (2015)
Uhlerop, C., Slavković, A., Fienberg, S.E.: Privacy-preserving data sharing for genome-wide association studies. J. Privacy Confid. 5(1), 137 (2013)
Google Scholar
Yan, X., Han, J., Afshar, R.: Clospan: Mining: Closed sequential patterns in large datasets. In: Proceedings of the 2003 SIAM International Conference on Data Mining, pp. 166–177. SIAM (2003)
Yu, F., Fienberg, S.E., Slavković, A.B., Uhler, C.: Scalable privacy-preserving data sharing methodology for genome-wide association studies. J. Biomed. Inf. 50, 133–141 (2014a)
Article Google Scholar
Yu, F., Rybar, M., Uhler, C., Fienberg, S.E.: Differentially-private logistic regression for detecting multiple-SNP association in GWAS databases. In: International Conference on Privacy in Statistical Databases, pp. 170–184. Springer (2014b)
Zeng, C., Naughton, J.F., Cai, J.Y.: On differentially private frequent itemset mining. Vldb J. 6(1), 25–36 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Medical Informatics, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
Xiang Wu, Yuyang Wei, Yaqing Mao & Liang Wang

Authors

Xiang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yuyang Wei
View author publications
You can also search for this author in PubMed Google Scholar
Yaqing Mao
View author publications
You can also search for this author in PubMed Google Scholar
Liang Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiang Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, X., Wei, Y., Mao, Y. et al. A differential privacy DNA motif finding method based on closed frequent patterns. Cluster Comput 22 (Suppl 2), 2907–2919 (2019). https://doi.org/10.1007/s10586-017-1691-9

Download citation

Received: 22 October 2017
Revised: 22 December 2017
Accepted: 29 December 2017
Published: 12 January 2018
Issue Date: March 2019
DOI: https://doi.org/10.1007/s10586-017-1691-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A differential privacy DNA motif finding method based on closed frequent patterns

Abstract

Access this article

Similar content being viewed by others

Novel algorithms for LDD motif search

DNA motif discovery using chemical reaction optimization

DNA Sequence Motif Discovery Based on Kd-Trees and Genetic Algorithm

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A differential privacy DNA motif finding method based on closed frequent patterns

Abstract

Access this article

Similar content being viewed by others

Novel algorithms for LDD motif search

DNA motif discovery using chemical reaction optimization

DNA Sequence Motif Discovery Based on Kd-Trees and Genetic Algorithm

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation