Statistical Database Modeling for Privacy Preserving Database Generation

Wu, Xintao; Wang, Yongge; Zheng, Yuliang

doi:10.1007/11425274_40

Xintao Wu²²,
Yongge Wang²² &
Yuliang Zheng²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3488))

Included in the following conference series:

International Symposium on Methodologies for Intelligent Systems

1105 Accesses
3 Citations

Abstract

Testing of database applications is of great importance. Although various studies have been conducted to investigate testing techniques for database design, relatively few efforts have been made to explicitly address the testing of database applications which requires a large amount of representative data available. As testing over live production databases is often infeasible in many situations due to the high risks of disclosure of confidential information or incorrect updating of real data, in this paper we investigate the problem of generating synthetic database based on a-priori knowledge about production database. Our approach is to fit general location model using various characteristics (e.g., constraints, statistics, rules) extracted from production database and then generate synthetic data using model learnt. As characteristics extracted may contain information which may be used by attacker to derive some confidential information, we present a disclosure analysis method which is based on cell suppression technique. Our method is effective and efficient to remove aggregate private information during data generation.

This research was supported by USA National Science Foundation Grant CCR-0310974.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chays, D., Dan, S., Frankl, P., Vokolos, F., Weyuker, E.: A framework for testing database applications. In: Proceedings of the ISSTA, Portland, Oregon (2000)
Google Scholar
Dobra, A., Fienberg, S.E.: Bounds for cell entries in contingency tables induced by fixed marginal totals with applications to disclosure limitation. Statistical Journal of the United Nations ECE 18, 363–371 (2001)
Google Scholar
Niagara, http://www.cs.wisc.edu/niagara/datagendownload.html
Quest, http://www.quest.com/datafactory
Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman Hall, Boca Raton (1997)
Book MATH Google Scholar
Stephens, J., Poess, M.: Mudd: A multi-dimensional data generator. In: Proceedings of the 4th International Workshop on Software and Performance, pp. 104–109 (2004)
Google Scholar
Wu, X., Wang, Y., Zheng, Y.: Privacy preserving database application testing. In: Proceedings of the ACM Workshop on Privacy in Electronic Society, pp. 118–128 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

UNC Charlotte,
Xintao Wu, Yongge Wang & Yuliang Zheng

Authors

Xintao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yongge Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuliang Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

LIRIS - UFR d’Informatique, Université Claude Bernard Lyon 1, 43, boulevard du 11 novembre 1918, 69622, Villeurbanne, France
Mohand-Said Hacid
Department of Computer Science, State University of New York, 12222, Albany, NY, USA
Neil V. Murray
Department of Computer Science, University of North Carolina, 28223, Charlotte, NC, USA
Zbigniew W. Raś
Shimane University, 89-1 Enya-cho Izumo, 6938501, Shimane, Japan
Shusaku Tsumoto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, X., Wang, Y., Zheng, Y. (2005). Statistical Database Modeling for Privacy Preserving Database Generation. In: Hacid, MS., Murray, N.V., Raś, Z.W., Tsumoto, S. (eds) Foundations of Intelligent Systems. ISMIS 2005. Lecture Notes in Computer Science(), vol 3488. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11425274_40

Download citation

DOI: https://doi.org/10.1007/11425274_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25878-0
Online ISBN: 978-3-540-31949-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics